Open Translation Tools

Text Content

Whether you are translating a blog post, news article, or the transcription of audio or video content, translation almost always involves translating text from one language to another. Fundamentally the role of a translator is to read text from one language such as a news article in Chinese and write the same content in another language such as German or Luganda. Every translator and every translation community will eventually develop a workflow to accelerate and make more efficient the process of making content available in multiple languages.

While some translators work without using any tools the vast majority use at least one or more of the following.

Types of Tools

  1. Dictionary - A translation, or bilingual, dictionary lists suggested translations of individual words (and sometimes phrases) from one language to another. The largest and most popular open translation dictionaries are http://wiktionary.org/, http://www.omegawiki.org, and http://open-dictionary.com/
  2. Machine Translation - Machine translation (MT) uses computers to translate text from one language to another. Machine translation is not available in all language pairs, and the resulting translations tend to be most accurate when working with languages from the same language family. Open source machine translation tools include http://www.apertium.org/  and http://www.statmt.org/moses/,
  3. Glossary - Unlike a dictionary which aims to define and translate every major word in a given language, a translation glossary or terminology list only includes special terms that require specific translations. Translators use glossaries when:
    • the translation includes special terminology or jargon such as advanced scientific, medical, or technical texts.
    • the translation is part of a larger set of documents that should maintain consistent terminology across all documents. 
    While a format like TBX exists for glossaries many tools use simple tab-delimited of comma separated files (CSV), the later can be opened in a spreadsheet if needed. A collection of open glossaries organized by language is available at http://www.lai.com/glossaries.html
  4. Translation Memory - As a translator works on an expanding body of documents it is likely that she will come across the same exact phrases over and over again. Each time a translation memory system comes across a phrase or group of words that have already been translated it will automatically replace the phrase with the previous translation. Open source translation memory systems include OmegaT (http://www.omegat.org/) and Anaphraseus (http://anaphraseus.sourceforge.net/). Google's Translate Toolkit (http://translate.google.com/toolkit) and Lingotek (http://www.lingotek.com/)are proprietary translation memory systems.
  5. Spell Check and Grammar Check - Finally, many translators use spell checkers to help the editing process and to ensure consistency in the spelling of translated words. Most open source tools will use hunspell to provide spell checking. http://jazzy.sourceforge.net/ is a server-based spell checker for projects hosted online.

Types of Workflows

Ad Hoc Translation Workflows

Most translators use one or more of the above-listed tools in an ad hoc fashion. For example, an English to Bengali translator might use Anubadok (http://anubadok.sourceforge.net/) for machine translation, http://www.ittefaq.com/dict/ as a dictionary and Ankur's Firefox plugin (http://www.ankur.org.bd/) as a spell checker.

Not every tool is available for every language. While there are no machine translation systems, for example, which translate between Malagasy and English, there is an online Malagasy dictionary with English and French translations (http://malagasyworld.org/bins/alphaLists?lang=mg).

Integrated Translation Workflow Systems

Professional translators tend to use integrated translation workflow systems which bring all five types of translation tools into one integrated interface. OmegaT and Virtaal, for example, are desktop application which integrate both translation memory and glossaries into a single translation workspace.

Though not open source, Google also offers a Translation Toolkit (http://translate.google.com/toolkit) which integrates machine translation, glossaries, translation memory, and dictionaries. (Explanatory video here - http://www.youtube.com/watch?v=C7W2NJFdoIg) Trados (http://www.trados.com/)is another proprietary integrated translation workflow system.

Worldwide Lexicon (http://www.worldwidelexicon.org/) is still in its early development stages but aims to offer an open source server-based integrated translation workflow system similar to Google's Translation Toolkit.

Publishing and Organizing Translations

The most revolutionary aspect of the internet has been in enabling users to quickly publish and distribute content. In fact, the only three obstacles to potentially distributing content to every single human around the world are 1.) access to the internet, 2.) literacy, and 3.) language. Once a document has been translated there are various ways to link, integrate, and manage the translations. Creating links between translations of a text also creates links between the comments that follow in each language. Also, if the source text is corrected, updated, or edited then ideally those changes are also made to each of the translations.

Blogs / Ad Hoc Translation

The simplest solution is simply to invite your readers to translate posts and re-post them on their own blogs, and then cross-link to each other. It is best to publish your content using a license which allows for derivative works so that your readers know that they are free to produce and republish translations of what you write. For example, Kevin Kelly published an essay to his blog called "The Expansion of Ignorance." (http://www.kk.org/thetechnium/archives/2008/10/the_expansion_o.php)  Enzo Abbagliati, a Chilean blogger, then translated the post into Spanish, published it on his own blog, and left a comment on Kelly's blog post to make him aware of the translation (http://abbagliati.blogspot.com/2008/10/la-expansin-de-la-ignorancia.html).  The essay was also translated by another reader into Chinese and Kelly later edited his post to point readers to all available translations. This method requires no specialized software, nor does it require that everyone use the same set of tools. You can publish your blog on WordPress, for example, while someone else translates the post on their Blogger-based site. This strategy works nicely for writers or blogs that have a following, and that publish on a light or occasional basis. 

Multilingual CMS

A number of open content websites are now publishing their blog posts, news stories, and essays in multiple languages. Examples include http://globalvoicesonline.org/, http://www.cafebabel.com,  http://www.eurotopics.net, http://vocesbolivianas.org, and http://www.indymedia.org.  

Until recently, most content management systems had mediocre and unstable support for multilingual publishing. This is beginning to change as open source CMSs such as Drupal, Joomla, WordPress, Plone, and Tribiq now allow for the localization of the interface into multiple languages and multilingual publishing. These systems do not yet integrate with translation workflow systems, which must be managed separately, but they do enable you to give translators access to your publishing system so that they can publish, manage, and edit translations of content.

Social / Community Translation System

Social translation platforms encourage a community of translators to create, edit and curate translations from a variety of sources. Examples include Meedan (http://meedan.net), where volunteers translate discussions around current events in the Middle East into Arabic and English, and Yee-Yan, where volunteers translate news-related content into English and Chinese. Social translation systems tend to build a community of volunteer and professional translators around a specific topic, interest, or mission.

Wikis

Wikis are widely used for multilingual content projects like Wikipedia (http://wikipedia.org), which exists in over 250 languages, and WikiTravel (http://wikitravel.org), a free travel guide available in over a dozen languages.

Wikipedia/MediaWiki

Mediawiki, the software created by the Wikimedia Foundation (http://wikimediafoundation.org/) to run Wikipedia, now has improved multilingual support. Newer tools, such as TikiWiki, have designed the translation processes into the system itself. This is an area where we expect much growth and evolution over the next two to three years.