Prosody Behavior for TTS Eng...

March 30, 2011

There have been some forum posts where I’ve seen some developers mention that there is strange behavior going on when testing IVR code (specifically, the <prosody> tag) between each of the different TTS engines: AT&T Natural Voices, Cepstral Swift, and Nuance Realspeak.

For example, here is some sample IVR code testing the <prosody> tag for AT&T Natural Voices:

<?xml version=”1.0″?>
<vxml version=”2.0″>
<form>
<block>
<prompt>
<voice name=”crystal”>
<prosody rate=”150.0″>
<say-as type=”address”>
The Mall Outlet
123 Main St
Chicago, Illinois
</say-as>
</prosody>
</voice>
</prompt>
</block>
</form>
</vxml>

However, if you were to use this same exact code and swap out “crystal” for “Diane” (Cepstral) or “Samantha” (Realspeak), you would end up hearing the prompt spoken back to you extremely quickly. Instead, you would have to adjust your code to look like the following:

<?xml version=”1.0″?>
<vxml version=”2.0″>
<form>
<block>
<prompt>
<voice name=”Samantha”>
<prosody rate=”+5%”>
<say-as type=”address”>
The Mall Outlet
123 Main St
Chicago, Illinois
</say-as>
</prosody>
</voice>
</prompt>
</block>
</form>
</vxml>

Note how the rate in the <prosody> tag was changed to “+5%” instead. These kind of inconsistencies between TTS engine behaviors can make it tough for an IVR developer.

Leave a Reply