Extracting DVD Subtitles

Software name : Avidemux
Software version : 2.4

If you want to extract subtitle files from a DVD you should understand a little how they work. Subtitles in DVDs are contained in VOB files along with the main video and audio streams. We can call them all streams here to account for the difference between a self contained file and a stream. Several streams can be included in a file.

The subtitles you see on a DVD are streams of images files which appear one after the other. Each stream displays a different language. When we extract these streams of subtitles the most handy format we can save them as is actually a text file which has the timecode of when the text appears. If the subtitle file you have is in text rather than image format it makes it easier to edit it and translate it. You can easily send that file via the internet or put it on a website for others to download.

In order to create a text-based subtitle file we first need to extract the images files from the DVD to two files:

  1. an *.idx file which has the time code of the image subtitles (this is called a VobSub file)
  2. and a *.sub file and contains the image information. 

We can then convert those files into a single text based subtitle file. There are many different formats but Avidemux uses a very compatible one with the '.srt' extention.

note : Screenshots in the following explanation are a combination of Ubuntu (Linux) and Windows operating systems. Avidemux works well in both and the interface looks the same except for a few color differences.

Extracting to an idx / VobSub file

From the Tools menu select 'VOB' and then 'VobSub' 


Then you should see the following screen asking you to Browse for three things.

  1. VOB file(s)
  2. IFO file
  3. VobSub file


Finding the VOB Files

When you click on the first Browse button in the above image we are asked to browse for the VOB files :


However sometimes it's not that clear where they are. The files we want are in a folder on the DVD (if you are doing this for files on a DVD) called VIDEO_TS folder.

Normally for a short film there is only one VOB file with video data in it. For longer films there is normally more than one, because there is a maximum file size for the VOB files.

Let's have a look at a complicated DVD structure. There are some small entries in the structure which are system files and files for the menu - we should ignore these. The files with the video, audio and subtitle files we need are the big ones. They start with names like VTS_02_1.VOB,VTS_02_2.VOB, VTS_02_3.VOB, VTS_02_4.VOB. If you click 'Browse' next to 'VOB Files' then you should browse to the appropriate directory ('VIDEO_TS') and you should see something like this :


For this task we need to select the first big VOB which in this case is VTS_02_1.VOB. The ones following it will be selected automatically. When you have selected the right one click on 'open' :


Locating the IFO file

If you click on the second button :


you will be asked to look for the IFO file. The IFO file has information on what language the different subtitle streams are, so we need to browse to find this file. If there is more that one IFO file in the DVD we need find the one that has the same beginning as the large VOB files. In this case it is VTS_02_0.IFO

When you have found it click on 'open' :


Select where to save the VobSub files

The third button :


will ask you to browse for a place to save the VobSub file. When you have found the right directory write the name of it in the box next to 'Name:' and make sure it ends with '.idx'. The below is an example (you can use any name, 'subs' is just my example) :


When you have done this, and if the other three boxes are complete, then press 'Save' :


Saving your files

When you have found or selected all the files. Then click 'OK' to shut the small window with the small buttons :


and you'll get a window telling you how long the process will take.


When this process is complete you will have created a new .idx file and and new .sub file. These will be saved in the directory you choose for saving the .idx file. In my case I saved them to the desktop :


Making the '.srt' File

Now we want to merge the idx file and the .sub file into a '.srt' file. Click on the top menu 'Tools' and then 'OCR (VobSub -> Srt)':


You should see a window titled 'MiniOCR'. 


Click on the 'Open' button under 'VobSub'. You will then see a window called 'VobSub Settings'.


Click on 'Select .idx' and browse for and select the idx file you created in the 'Extracting to an idx / VobSub file' section.


Click on 'Open' when you have selected the idx file. You should return to the 'VobSub Settings' window :


If the DVD you are using has more than one language it should be displayed in the 'Select Language' drop down box. Select the language you want to create a subtitle file for.


When you have the right language selected click 'OK', and you should return to the 'MiniOCR' window. Now you need to select a place on your computer to save the target *.srt file to. Click on the 'Save' button in the 'Output srt' section :


You will see a window asking you to choose a folder to save the srt file in.


Browse until you find the right place. When you have, give the file a name by typing in a name in the box at the top


make sure the name ends in '.srt' and then click 'Save'


Now you have set your input and output files you can start the process of converting the images file in to a text file. This process is called OCR. Click 'Start OCR'.


You should see a window like this: 


The OCR (Optical Character Recognition) process needs you to tell it what the characters (letters and numbers + symbols) in the subtitles are. It will display a character from the image subtitle and you have to then tell the application what the corresponding text character is. Avidemux will show you a phrase and one character for that phrase like this:


Now you must type the right character in the empty text field.


You do this because it is more accurate for you to specific exactly what the characters are than for the application to guess.

Where it says 'Current Glyph Text:' and shows an image of a character you need to enter that character using the keyboard in the box below and then click 'OK'. It does make a difference if it is a capital letter or a lower case letter. Also this process is very unforgiving at the moment. There is no undo option, so don't get it wrong!­

Sometimes 2 characters well be selected. You should enter those two characters and click enter. This may seem to be taking a long time but when you have entered all the characters and numbers the program should fly through the subtitles. You should be able to process a 90 minute film in 5 -10 minutes.

When you are finished the '.srt' file you saved will have the right ­timecode and subtitle information in it. You can open it with a text editor and it should look something like this:

00:00:10,991 --> 00:00:13,991
 Mick Jagger
00:00:18,565 --> 00:00:21,565
 - Mick Jagger
 - Thank you
00:00:32,479 --> 00:00:35,479
 - Man: Mick Jagger.
 - ( police radio squelch )
00:01:04,778 --> 00:01:06,011
 one minute! one minute!