Ogg Theora Cook Book

Embedding Subtitles

If you want your video file to contain a subtitle file, so you don't have distribute the .srt file separately, you need to embed the subtitle file into the video. The video encoding tool ffmpeg2theora has a few command line options to include subtitles in your video.

ffmpeg2theora us available for most operating systems, including Windows, Mac OS X, and GNU/Linux.

Related to subtitles three commands are important:

  • --subtitles pointing to a subtitle file in SRT format,
  • --subtitles-language to define the language of the subtitles
  • --subtitles-encoding to specify the character set of the subtitles file used.

Lets have a look at some of the required options for using ffmpe2theora for embedding srt files in a Theora video file.

subtitles-language - This option sets the specified language.  Every language has a standard code, which helps people describe a language, whatever their own language.  For example, in English the language spoken in Germany is called German, but in Germany, it's called Deutsch.  To prevent confusion, there is an international standard (ISO 639-1) that represents each language with a two letter code. In our example, the code for German is 'de'. 

subtitles-encoding - This option specifies the encoding standard for text, a complexity necessary given varying strategies for representing the wide range of characters used by all languages on earth.  For a long time, computers used 7-bit character sets of 127 characters to represent the alphabet and other writing symbols. For example, US-ASCII has 94 printing characters and 33 control codes. Numerous 8-bit character sets, with 256 codes, have appeared since then for alphabets and syllabaries, and several encoding systems using 16-bits for writing systems based on Chinese characters. However, 7 or even 8 bits is not enough space for all the typographical symbols in even one alphabet, much less for the dozens of writing systems in use today.  People created the Unicode Character Set to support all languages at once.  The UTF-8 encoding of Unicode is specified for use "on the wire", that is, in all external communications between systems.

However, a lot of people still use old encodings. The bad thing about these is that they overlap, using the same set of codes for completely different characters. The usual result of rendering a text according to an incorrect encoding is gibberish.  So, by default, subtitles are expected to be in Unicode UTF-8 encoding. If they are not, you need to tell ffmpeg2theora.  If you're writing in English, chances are you'll be writing in ASCII, ISO-8859-1 (Latin-1), or possibly Windows code page 1252. By design, US-ASCII is a subset of UTF-8, so you'll be OK there, but you will get into trouble if you use any extension of ASCII in a Unicode context.

Example commands for subtitle embedding

Here  are a few examples that take an existing mp4 video file (input.mp4) and output a ogg video file (output.ogg) with embedded subtitles :

If you have a subtitles file in English (the language code for English is 'en'):

ffmpeg2theora input.mp4 --subtitles english-subtitles.srt --subtitles-language en -o output.ogv

If you have a subtitles file in Spanish, encoded in latin1 :

ffmpeg2theora input.mp4  --subtitles spanish.srt --subtitles-language es --subtitles-encoding latin1 -o output.ogv

There are other subtitles options for ffmpeg2theora, but these are the main ones.

Adding subtitles to an existing video

If you have a Theora video with no embedded subtitles, it's easy to add some too, without the need to encode the video again. Since each subtitles language is stored in the Ogg file separately, they can be manipulated separately.

Internally, subtitles embedded in an Ogg file are encoded as Kate streams. Such streams are created by ffmpeg2theora, but can also be created 'raw' from a SRT file. The kateenc tool does this. On Ubuntu kateenc is part of the kate-tools package. To install do this:

sudo apt-get install libkate-tools

For instance, the following creates a new English subtitles stream from a SRT file. Remember, the code for English is 'en':

kateenc -t srt -o english-subtitles.ogg english.srt -c SUB -l en

Now you've got a single subtitles stream, which you can add to your Theora video:

oggz-merge -o video-with-subtitles.ogv original-video.ogv english-subtitles.og

On Ubuntu oggz-merge is part of the oggz tools package, to install, do this:

sudo apt-get install oggz-tools

In fact, the oggz tools allow more more powerful manipulation of all the different tracks in the video, so you can add more audio languages too, etc.