Plum DEV Reference Manual

Plum DEV v. 3.0

© 2013 Plum Group, Inc. All rights reserved.

4. TTS Speech Engine Characteristics

4.1 Voice Tag Attributes

<gender>:

AT&T Natural Voices, Cepstral Engine:

This attribute works fine for these speech engines.

RealSpeak Engine:

The gender attribute should not be used if the name attribute is already being used for the <voice> tag.

<age>:

AT&T Natural Voices:

This attribute is not supported.

Cepstral Engine:

This attribute looks for an exact match, instead of looking for the closest match. For example, <voice age="10"> will only select a ten-year-old voice, or fall back to the default voice if one is not found.

RealSpeak Engine:

This attribute is not supported.

<name>:

If you have an onsite system, please contact your sales account manager for which of these voices you have installed on your server.

The following names are supported by their respective engines:

AT&T Natural Voices:

Language Name Gender US UK Audio Sample
American English (en_us) Mel male x
American English (en_us) Mike male x x
American English (en_us) Ray male x x
American English (en_us) Rich male x
American English (en_us) Claire female x
American English (en_us) Crystal female x x
American English (en_us) Julia female x
American English (en_us) Lauren female x x
Spanish (es_us) Alberto male x
Spanish (es_us) Rosa female x
British English (en_uk) Charles male x x
British English (en_uk) Anjali female x
British English (en_uk) Audrey female x x
French (fr_fr) Alain male x
French (fr_fr) Juliette female x x
German (de_de) Reiner male x x
German (de_de) Klara female x x

If no name is specified, mike is the default voice for the US AT&T Natural Voices while charles is the default voice for the UK AT&T Natural Voices.

Cepstral Engine (case-sensitive):

Language Name Gender US UK Audio Sample
American English (en_us) David male x x
American English (en_us) William male x x
American English (en_us) Diane female x x
Spanish (es_us) Miguel male x x
Spanish (es_us) Marta female x x
British English (en_uk) Lawrence male x x
British English (en_uk) Millie female x x
French (fr_fr) Jean-Pierre male x x
French (fr_fr) Isabelle female x x
German (de_de) Matthias male x x
German (de_de) Katrin female x x
Italian (it_it) Vittoria female x x

If no name is specified, Diane is the default voice for the US Cepstral Engine while Millie is the default voice for the UK Cepstral Engine.

RealSpeak Engine (case-sensitive):

Language Name Gender US UK Audio Sample
American English (en-US) Tom male x
American English (en-US) Jennifer female x
American English (en-US) Jill female x
American English (en-US) Samantha female x
Mexican Spanish (es-MX) Javier male x
Mexican Spanish (es-MX) Paulina female x
British English (en-GB) Daniel male x x
British English (en-GB) Emily female x x
Australian English (en-AU) Lee male x
Australian English (en-AU) Karen female x
Canadian French (fr-CA) Felix male x
Canadian French (fr-CA) Julie female x
Portuguese (pt-PT) Madalena female x
Brazilian Portuguese (pt-BR) Raquel female x
German (de-DE) Yannick male x
German (de-DE) Steffi female x x
Spanish (es-ES) Diego male x
Spanish (es-ES) Isabel female x
French (fr-FR) Sebastien male x
French (fr-FR) Virginie female x
Italian (it-IT) Silvia female x x
Dutch (nl-NL) Claire female x x
Belgian Dutch (nl-BE) Ellen female x
Mandarin Chinese (zh-CN) Mei-Ling female x

If no name is specified, Jill is the default voice for the US Realspeak Engine while Emily is the default voice for the UK Realspeak Engine.

Please contact your account manager if you want any of the following Realspeak voices:

Language Name Gender
Danish (da-DK) Nanna female
Italian (it-IT) Paolo male
Indian English (en-IN) Sangeeta female
Spanish (es-ES) Monica female
Basque (eu-ES) Arantxa female
Japanese (ja-JP) Kyoko female
Korean (ko-KR) Narae female
Korean (kr-KR) Narae female
Norwegian (no-NO) Nora female
Polish (pl-PL) Agata female
Russian (ru-RU) Katerina female
Swedish (sv-SE) Ingrid female
Hong Kong Cantonese (zh-HK) Sin-ji female

For the RealSpeak Engine, this attribute MUST be used along with its corresponding xml:lang attribute if the language is not en-US (American English). For example, to hear the Mexican Spanish voice "Javier", one must type the following:

<speak xml:lang="es-MX"><voice name="Javier">
┬┐Hacen usted tienen gusto de los huevos?
</voice></speak>

NOTE: For US speech recognition, we currently only offer American English speech recognition, Spanish speech recognition, and French-Canadian speech recognition for Plum DEV If you are interested in any other speech recognition languages, please contact your sales representative.

NOTE: For UK speech recognition, we currently only offer American English speech recogition and British English speech recognition for Plum DEV If you are interested in any other speech recognition languages, please contact your sales representative.

<xml:lang>:

If you have an onsite system, please contact your sales account manager for which of these languages you have installed on your server.

The following languages are supported by their respective engines:

AT&T Natural Voices:

Language Code Value US UK
German de_de x x
British English en_uk x x
American English en_us x x
Spanish es_us x x
French fr_fr x x

Cepstral Engine:

Language Code Value US UK
American English en_us x x

RealSpeak Engine:

Language Code Value US UK
American English en-US x
Mexican Spanish es-MX x
Canadian French fr-CA x
German de-DE x x
British English en-GB x x
French fr-FR x
Spanish es-ES x
Belgian Dutch nl-BE x
Dutch nl-NL x x

Please contact your account manager if you want any of the following Realspeak languages:

Language Code Value
Danish da-DK
Swiss German de-CH
Australian English en-AU
Indian English en-IN
Basque eu-ES
Belgian French fr-BE
Swiss French fr-CH
Swiss Italian it-CHC
Italian it-IT
Japanese ja-JP
Korean ko-KR
Korean kr-KR
Norwegian no-NO
Polish pl-PL
Brazilian Portuguese pt-BR
Portuguese pt-PT
Russian ru-RU
Swedish sv-SE
Mandarin Chinese zh-CN
Hong Kong Cantonese zh-HK

Note that different syntax is used for the xml:lang attribute for the RealSpeak Engine. For example, <voice xml:lang="fr-FR"> would have to be typed to hear a French speaker. For the AT&T Natural Voices Engine and Cepstral Engine, one would type <voice xml:lang="en_us"> to hear an American speaker.

4.2 Voice Child Tags

An "x" marks that the Child Tag is supported by the speech engine. An asterisk (*) means that there are notes to explain the difference between the speech engines.

Child Tag AT&T Natural Voices Cepstral Engine RealSpeak Engine
<break>* x x x
<emphasis>
<enumerate> x x x
<mark>
<paragraph>* x x x
<phoneme>* x
<prosody>* x x x
<say-as>* x x x
<sentence>* x x x
<speak> x x x
<sub> x x x
<value> x x x

<break>:

AT&T Natural Voices and RealSpeak Engine:

The break element works fine for these engines.

Cepstral Engine:

The "size" attribute of the break element does not work for this engine.

<paragraph>:

Cepstral Engine:

The "xml:lang" attribute does not work with the paragraph element.

<phoneme>:

AT&T Natural Voices:

The phoneme element works fine using the Phoneme Set shown below.

Cepstral and RealSpeak Engine:

This element is not supported.

Phoneme Set for AT&T Natural Voices:

US English:

Phoneme Example Transcription
aa Bob b aa b 1
ae bat b ae t 1
ah but b ah t 1
ao bought b ao t 1
aw down d aw n 1
ax about ax 0 b aw t 1
ay bite b ay t 1
b bet b eh t 1
ch church ch er ch 1
d dig d ih g
dh that dh ae t 1
dx butter b ah 1 dx er 0
eh bet b eh t 1
em Chatham ch ae 1 dx em 0
en satin s ae 1 q en 0
er bird b er d 1
ey bait b ey t 1
f fog f ao g 1
g got g aa t 1
hh hot h aa t 1
ih bit b ih t 1
iy beat b iy t 1
jh jump jh ah m p 1
k cat k ae t 1
l lot l aa t 1
m Mom m aa m 1
n nod n aa d 1
ng sing s ih ng 1
ow boat b ow t 1
oy boy b oy 1
p pot p aa t 1
q button b ah 1 q en 0
r rat r ae t 1
s sit s ih t 1
sh shut sh ah t 1
t top t aa p 1
th thick th ih k 1
uh book b uh k
uw boot b uw t 1
v vat v ae t 1
w won w ah n 1
y you y uw 1
z zoo z uw 1
zh measure m eh 1 zh er

0 Unstressed
1 Primary stress
2 Secondary stress
& Word boundary

UK English:

Phoneme Example Transcription
p point p OI n t 1
b big bIg1
t team t i: m 1
d dare de@1
k case k eI s 1
g good gUd1
dZ ginger dZ I n 1 dZ @ 0
tS check tS e k 1
f fool f u: l 1
v vest vest1
D this DIs1
T thick TIk1
s sell sel1
z zeal z i: l 1
S shoot S u: t 1
Z measure me1Z@0
h house h aU s 1
m main m eI n 1
n name n eI m 1
N sing sIN1
l life l aI f 1
@I bottle b Q 1 t @l 0
r right r aI t 1
j yes jes1
w wood wUd1
i: beat b i: t 1
I bit bIt1
eI bait b eI t 1
e bet bet1
A: father f A: 1 D @ 0
{ bat b{t1
@U boat b @U t 1
O: bought b O: t 1
Q boss bQs1
u: boot b u: t 1
U book bUk1
V but bVt1
3: bird b 3: d 1
aU bout b aU t 1
OI boy b OI 1
aI bite b aI t 1
@ scallop sk{1l@p0
I believe b I 0 l i: v 1

0 Unstressed
1 Primary stress
2 Secondary stress
& Word boundary

<prosody>:

AT&T Natural Voices:

The prosody element works fine for this engine. You can specify a preset rate ("fast", "medium", "slow", or "default"). However, using a preset rate is not recommended because it either sets the voice rate to too slow or too fast. The "rate" attribute can also be set to an integer value such as "100.0" or "50.0". A normal voice rate should be set to around "150.0". These values are not in accordance with the SSML spec, where rates are specified relative to 1. Additionally, you can also adjust the voice rate by using percentages. To increase the rate you could type "+50%" to make the voice rate 50% faster or "-50%" to make the voice rate 50% slower. Note that the "pitch" attribute does not work for this engine.

Cepstral Engine:

The prosody element works fine for the Cepstral Engine. Also, the "pitch" attribute only works for the Cepstral Engine. Note that you cannot specify the "rate" value as an integer using this engine, but pecentages and the presets rates ("fast", "medium", "slow", or "default") work as expected.

RealSpeak Engine:

When using a Realspeak TTS voice, the talking speed of the TTS voice does not revert back to the normal speed after the <prosody> tag has been used. To revert it back to normal, you must use the <prosody> tag again with the attribute of "volume" set to "100.0" and the attribute of "rate" set to "default". Note that the "pitch" attribute is not supported in this engine. Also, you cannot specify the "rate" value as an integer using this engine, but pecentages and the presets rates ("fast", "medium", "slow", or "default") work as expected.

<say-as>:

The table below shows the <say-as> tag types and the speech engines that support them. An "x" marks that the <say-as> tag is supported by the speech engine.

Say-as Tag Types AT&T Natural Voices Cepstral Engine RealSpeak Engine
acronym* x x
address x x x
number x x x
number:cardinal x x x
number:ordinal x x
number:digits x x
number:decimal x x x
number:fraction x x x
number:telephone x x x
date x x x
date:dmy* x x x
date:mdy* x x x
date:ymd* x x x
date:ym* x x
date:my* x x x
date:md* x x x
date:dm* x x x
date:y* x x x
date:m x x
date:d x x
date:day x
digits x
duration x
duration:h x
duration:hm x
duration:m x
duration:ms x
duration:s x
measure* x x x
name x x x
net:email x x x
net:uri* x x
time* x x x
time:h x x
time:hm x x x
time:hms x x
spell x
telephone* x x x
currency* x x x

acronym: The acronym tag type works fine in the US, but does not work in the UK. If you are using AT&T Natural Voices and you want to spell out words or say back digits in the UK, you would have to use commas inside of a string such as "a, c, r, o, n, y, m" or "1, 2, 3, 4, 5".

date:mdy: The preferred format of this tag is "month abbreviation day, year". For example, to return "December 25, 2001", you would type "Dec 25, 2001". You can also use the "month/day/year" format such as "12/25/01" for the US, but this format will not work in the UK.

date:dmy: The preferred format of this tag is "day month abbreviation, year". For example, to return "December 25, 2001", you would type "25 Dec, 2001".

date:ymd: The preferred format for this tag is "year month abbreviation day". For example, to return "December 25, 2001", you would type "2001, Dec 25".

date:my: The format of this tag should be "month abbreviation, year". For example, to return "December, 2001", you would type "Dec, 2001".

date:md: The preferred format for this tag is "month abbreviation day". For example, to return "December 25", you would type "Dec 25". You can also use the "month/day" format such as "12/25" for the US, but this format will not work in the UK.

date:dm: The preferred format for this tag is "day month abbreviation". For example, to return "December 25", you would type "25 Dec".

date:ym: The preferred format for this tag is "year/month". For example, to return "December 2001", you would type "2001/12".

date:y: The date:y tag type works fine in the US, but does not work in the UK.

measure: For AT&T Natural Voices, the preferred format would follow one such as 5'4". For Cepstral or Realspeak, either the format 5'4" or 5m will work.

net:uri: For AT&T Natural Voices, the preferred format is www.examplewebsite.com. For Cepstral, either the format http://www.examplewebsite.com or www.examplewebsite.com can be used. The RealSpeak Engine does not read back correctly and will play a web address as 'www point examplewebsite point com'.

time: The time tag type works fine in the US, but does not work in the UK.

telephone: The telephone tag type works fine in the US, but does not work in the UK.

The format for telephone numbers is: 123-456-7890

The format for telephone extensions is: 123-456-7890 ext1234

NOTE: For extensions, AT&T Natural Voices and Realspeak will say the number back correctly. In the example above, AT&T Natural Voices and Realspeak will say, "one two three four five six seven eight nine zero, extension one two three four." However, Cepstral will say, "one two three four five six seven eight nine zero, extension twelve thirty-four." To account for this, you can insert commas between the numbers after extension: 123-456-7890 ext1,2,3,4.

currency: When using the say-as type, currency, for AT&T Natural Voices with a Spanish TTS voice, please keep in mind that you will need to format the currency to $<dollar amount>,<cents amount>. The currency amount will not be pronounced correctly if you format it as $<dollar amount>.<cents amount>.

<sentence>:

Cepstral Engine:

The xml:lang attribute does not work with the sentence element.