The Difficulties of Alphanum...

April 28, 2011

Speech recognition enables users to speak their responses that then get processed and interpreted by interactive voice response systems.

In order for engines to recognize and process various voice queues, hundreds of sounds have to be previously recorded and uploaded into a database.  From there, the database will piece together concatenated speech to formulate the words into data that will be read back to the caller.

The computer will often times run into problems, however, as it is difficult to distinguish certain letters and sounds.  Databases and computers often phonetically spell the voice queues they receive, which can be problematic for text-to-speech engines to correctly interpret.  There are linguistic limitations that restrict a computer’s ability to correctly distinguish between certain words and sounds.

Especially when working with heteronyms (desert [leave]/desert [type of climate]) computers will have a nearly impossible time of distinguishing between these words, and the pronunciation will be exactly the same.  Human beings have trouble recognizing the differences and use their deductive reasoning skills to dissect the context in which the word is used.

Computers, with their nearly infinite storage capacity and brainpower, are not programmed to have these deductive reasoning skills.  Typically, there will be one programmed pronunciation for a word.  If an altered pronunciation is necessary, users would need to alter the word phonetically so that the machine could communicate it in the correct manner.

Additionally, there are several sets of letters that sound very similar to the human ear (M&N, B&T, S&F).  Human beings, in casual conversation, have problems recognizing and identifying the differences between similar sounding letters.

The military phonetic alphabet, complete with words used to identify letters, was created so that transmissions were not mistaken or fumbled (Alpha, Bravo, Charlie, Delta, Echo).  Computers have an even harder time making these distinctions in the speech recognition process.

A way to curb the difficulties caused by pronunciation and phonetics is to use DTMF input so that there are very few questions about the intended data input or output.

Comments are closed.