We seem to get a lot of questions about why our speech recognition engine can’t transcribe any random thing you’d like to say. There are two things working against the speech recognition engine: first, open speech transcription is a very hard thing to do without training. I doubt most callers would be willing to hang out for a few minutes training the IVR to understand what they’re saying. Even the best desktop speech recognition systems require keyboard-based user intervention for corrections. This level of preconfiguration and interaction is simply impossible to expect from users who want to quickly get information over the phone.
Second, the public switched telephone network is not a high-fidelity audio system. Your CDs store music at 44kHz, 16-bit, stereo. The PSTN transmits audio at 8kHz, 8-bit, mono. That’s a 22-to-1 difference. The mic on your PC usually records at 44kHz, 16-bit, mono. That’s still 11-to-1. An IVR is at a huge disadvantage relative to a PC in being able to process speech. It’s also why saying letters to an IVR is a difficult task. Humans have difficulty differentiating between “b” and “p”, or “t” and “d” — and we have the advantage of being able to divine context out of what’s spoken. If people have trouble with spelling over the phone, you can bet a speech recognition doesn’t stand a chance.
Leave a Reply