Text-to-speech (TTS) capabilities are critical components of a majority of IVR systems because they allow for natural interactions between the speaker and the voice application. TTS allows users the option to vocalize their utterances as opposed to entering them strictly via their touch-tone keypad.
Many of us have had a laugh after interacting with an IVR system where the TTS pronunciation is off, but this is to be expected. Because we are interacting with machines, it is unrealistic to expect perfect pronunciation of every single word.
Developers are constantly working to correct these mispronunciations and errors, but with millions of words present in the English lexicon alone (and new words being added daily) it is impossible to correct every possible mispronunciation. Add that to the fact there are a huge number of languages that businesses would like to integrate with TTS systems, and correct pronunciation becomes an even more complex proposition.
Language-specific optimization, which formulates words based on the phonetic alphabet, would potentially aid TTS system with pronunciation right? The question is complex. The basic premise of the inquiry is whether a TTS engine could be prompted in to producing natural-sounding speech by feeding the system text that is formatted phonetically. How would one make TTS speech more natural or normal sounding?
Plum CEO, Andrew Kuan has answered this question on Quora. His point is a good one stating “Just because you can notate the pronunciation of a word doesn’t mean that it will result in a natural reproduction of how it would be spoken by a native speaker.”
He also aptly points out that while you may be able to extract correct pronunciation by using the phonetic alphabet, this still would not account for syntax or other inflections, which is a significant component of natural language and speech. Myriad speech characteristics integral to the speech quality are unable to be relayed via phonetic spellings.
The more relevant consideration when trying to identify a highly functioning TTS system is looking at its components, including whether or not it has a robust dictionary and a high-functioning text parser.
If an engine posses these qualities, it is likely that it is a good TTS system that can function without much language-specific optimization or other superfluous input.

