Open Translation Tools

Basic Technical Concepts

Here's a quick introduction to some technical terms you are likely to encounter, as a translator of digital texts. Full definitions and explanations can be found in the section "Technical Concepts" later in this book.

Fonts

Characters on a computer screen are rendered using fonts, files that contain definitions for each character. A given font may have definitions of characters in several different alphabets or writing systems. If a font does not support a given character, text requiring those characters may be displayed as question marks or boxes on the reader's computer screen or may be substituted with a character from another font.

Unicode

Unicode is the database of characters. Older methods of managing characters restricted the computer to the use of limited sets of characters. So for example displaying Russian text and Arabic text at the same time was not feasible. The Unicode standard is intended to enable computers to display and print any combination of scripts together.

UTF-8, UTF-16

These are ways of managing characters within Unicode.

ISO 639 code

The International Standards Organization has assigned short codes to represent languages, consisting of two to four Roman letters; for example, German is represented by de and Japanese is represented by ja. You may encounter these codes when looking at web sites with translated content, particularly in the page name or the URL.

Locales

Every computer user reads and writes text and runs programs in a certain locale, depending on the default language and the geographic region of the user. The locale includes: how dates and times are displayed, default currency, how numbers are represented, the keyboard layout and other features.

RTL

Right-to-Left - the direction of the flow of text on a page. Arabic script is RTL. English is LTR (Left-to-Right).

Bi-Directional Text

The placement of both RTL and LTR on the same page. This has complex issues for many kinds of software.

Input methods

Writing in some languages is easier on a standard computer keyboard than in others. Ideograph-based writing systems, for example, require some other method of getting the text in than assigning characters to keys; these methods are called "input methods". They vary from typing a representation of the text in Roman characters, to assigning calligraphic strokes to specific keys.

Keyboard layouts

In order to type text in any given language, the numbers produced when keys are pressed must be mapped to characters in the language's script, including accents, ligatures and other markings. This is done by use of a keyboard layout. In cases where such mappings are infeasible, special input methods can be used (see the definition of that term).