Text2Speech Voices
UpStage’s speech is generated by the Festival Speech Synthesiser, developed at the Centre for Speech Technology Research at Edinburgh University (http://www.cstr.ed.ac.uk/projects/festival/).
An avatar's voice is selected from a dropdown menu when adding a new avatar and changed in the "Manage avatars" section (see section 3.2.3). There are currently about 100 voices on the UpStage server (http://upstage.org.nz:8084); if you are setting up your own UpStage server, please see the technical documentation regarding installing voices.
Voices currently available on UpStage
The voices currently available with UpStage have a filenaming system that describes what kind of voice each one is. Some of the voices speak English in a foreign accent, some speak English in various English accents, and some are designed to speak other languages. The software defaults to a male voice, but we have endeavoured to make female versions of most voices.
You can test the voices on the Avatar Edit screen, by selecting different voices from the drop down menu and entering the text you want to test.
The format is: ["e" or "emb"] _ [native language] - [en] - [modifications]
For example:
e_de – speaks and reads German
e_en – speaks and reads English
e_en-fast-f1 – speaks English quickly, in a female voice
e_en-wm – speaks english in a west midland accent.
Other accents in the e_en series are "n" for north, "sc" for Scots, "rp" for RP, "r" for rhotic (which means it pronounces the r in words like church).
emb_af1 – speaks and reads Afrikaans
emb_af1-en – speaks English in an Afrikaans accent
emb_de4-en-low-slow – speaks english, lowly and slowly, in a german accent
We are in the process of compiling descriptions for all the voices; following is the information so far:
Voice |
Male |
Female |
Accent |
Non-Eng |
Description |
awb_cmu |
X |
|
Scottish |
|
Soft, slightly muffled |
awb_nitech |
X |
|
Scottish |
|
Clear, not very deep |
bdl_cmu |
X |
|
English? |
|
A little bit quavery |
bdl_nitech |
X |
|
English? |
|
Firmer than bdl_cmu, a bit higher, but clearer |
bud |
X |
|
NZ? |
|
Deep, calm |
clb_nitech |
|
X |
NZ? |
|
Robotic, soft |
crunchy |
|
|
|
|
Crunchy - good for witches & effects |
default |
|
X |
NZ? |
|
Smooth, young |
e_en-fast-f1 |
X |
|
NZ? |
|
fast, boyish |
e_en-r-f3 |
X |
|
NZ? |
|
fast, boyish |
e_en-wm-slow |
X |
|
Australian?? |
|
nasal drawl |
e_en-wm-slow-f3 |
X |
|
Australian?? |
|
boyish nasal drawl, computerish & like a learner-reader reading |
e_eo |
X |
|
|
foreign |
|
emb_de4 |
X |
|
German |
German |
neutral German male |
emb_de4-en |
X |
|
German |
|
mid-range, clean, English w/German accent |
emb_de4-en-low-slow |
X |
|
German |
|
pimp's voice: low & lecherous (English w/German accent) |
emb_de5 |
X |
|
German |
German |
slow low somewhat distorted voice |
emb_de5-en |
X |
|
German |
|
slow high somewhat distorted voice, English w/German accent |
emb_de5-en-high-slow |
X |
|
German |
|
mid-high slightly strangulated male, English w/German accent |
emb_de7 |
X |
|
German |
German |
middle somewhat slow and drawn out male, German |
emb_en1-high |
X |
|
English |
|
soft mid-range male voice |
emb_fr1-en-low |
X |
|
European |
|
low & lecherous |
emb_fr4-en-high-slow |
X |
|
European |
|
mid-high male voice, sounds like he has trouble speaking |
emb_hu1-en-slow |
|
X |
European? |
|
low soft female voice with slight European accent |
emb_nl2 |
X |
|
|
Dutch |
mid-low male |
emb_nl2-en |
X |
|
European? |
|
mid-low male voice with slight European accent |
emb_pl1 |
|
X |
|
Polish? |
mid-low calm female |
emb_pl1-en |
|
X |
Polish? |
|
mid-low calm female with European accent |
emb_ro1-en |
X |
|
|
|
|
emb_sw1-en-fast |
X |
|
Swedish? |
|
mid-low male speaking quickly |
emb_sw2-en-high-slow |
|
X |
Swedish? |
|
mid-high female with European accent |
high |
X |
X |
Computer |
|
boyish computer monotone |
rms-faster |
X |
|
American |
|
|
rms-nitech |
X |
|
American |
|
Deeper than roger, clear, little bit emphatic |
roger |
X |
|
English |
|
Thin, proper-sounding, not deep |
slow |
X |
|
computer |
|
gets slower & lower, very good for effects |
slt-cmu |
|
X |
American |
|
Flat, slightly muffly |
slt-nitech |
|
X |
American |
|
Flat, a bit clearer & stronger than slt-cmu |
Adding more voices
You can install additional speech plug-ins on your own server to extend the range of voices available to the avatars. As long as you don't mind messing around with the sourcecode a little bit it's not difficult – Patricia Jung explains how she did it (for Linux, using UpStage V1 - note that this is now several years old):
Just add another entry in the VOICES section in Upstage/upstage/voices.py like:
#txt2pho/mbrola:
'de1': ("| /usr/local/mbrola/pipefilt | /usr/local/mbrola/preproc /usr/local/mbrola/Hadifix.abk /usr/local/mbrola/Rules.lst | /usr/local/mbrola/txt2pho -p /usr/local/mbrola/data/ |/usr/local/mbrola/mbrola /usr/local/mbrola/de1/de1 - -",
_fest),
I know, it looks awful but this is only because the command is an awful chain consisting of four commands with a couple of options each and the relevant path:
"| pipefilt ...| preproc ... | txt2pho ... | mbrola ..."
It does some preprocessing (like exchanging all appearances of "z.B." with "zum Beispiel"), then hands the resulting text over to txt2pho and to mbrola.
As long as your command or command chain takes text input from the standard input and outputs the result as sound in raw format on the standard output chain (Unix stuff, ask me if you haven't heard about it) you can put whatever you like in between the "| and the ".
The above mentioned awful command chain will work when one has installed the txt2pho frontend; it uses the de1 female mbrola voice, and you can choose it in the web interface using the name de1.
The only problem with this kind of reconfiguration is: As config.py isn't a nice configuration file but a python script one needs to know at least that python is very picky about vertical alignment: It's extremely important that your new voice entries have the same amount of whitespaces at the beginning of the line as the other voice entries.
The reason it took me so long was TTS: I failed completely and utterly in making the German festival extensions for use with mbrola voices:
http://www.ims.uni-stuttgart.de/phonetik/synthesis/festival_opensource.html
work. Then I tried txt2pho with mbrola:
http://www.ikp.uni-bonn.de/dt/forsch/phonetik/hadifix/HADIFIXforMBROLA.html
(http://bogmog.sourceforge.net/document_show.php3?doc_id=34 has a nice installation description), ignoring festival, and this worked at once.