The Limitations of Automatic Speech Recognition & How to Combat Them

Automatic Speech Recognition (ASR) is pretty cool technology. Simply say something and a machine will do what you ask it to do. Usually.

When an ASR grammar is tuned properly and the call conditions are good, ASR works really well.

Unpacking that last sentence should help you understand a bit more about ASR, how it works, and how to recognize when to implement it.

Because let’s be honest, ASR provides a great customer experience, but no technology is perfect. You should be aware of some of ASRs limitations before going too crazy with it.


We’ve talked about ASR and how it relates to natural language processing, and even artificial intelligence so we don’t need to spend too much time on grammars. Nevertheless, grammars are the part of the speech rec system that is programmed with all the different utterances that could constitute a valid response to a prompt.

If an IVR asks you a yes/no question, there are tons of different ways to say ‘no’: nope, nah, nuh uh, no, etc. When people talk about tuning a grammar they’re talking about listening to all the different ways callers respond to prompts to ensure that those specific utterances are in the grammar.

For companies with broad national or global reach, this issue is compounded when you factor in different accents and dialects.

Connection Matters

As important as grammars are, it’s equally important to pay attention to phone connections. Any call that goes into Plum’s (or any other) IVR platform will traverse the PSTN at some point. The PSTN’s bandwidth is limited so all audio that travels across it is down-sampled and compressed to 8bit, 8kHz.

Remember tape dubbing? If you made a copy of a VHS or a cassette the copy wasn’t quite as clear as the original. If you made a copy of the copy, the third-generation copy was degraded even more than the previous copy. That’s essentially what’s happening with the down-sampling and compression.

This matters for speech rec because you want the ASR engine to have the highest quality audio possible when it tries to decode utterances.

There are other factors that affect audio quality as well. Jitter from VoIP phones, static on the line or a bad connection, and background noise all have a negative effect on audio quality.

This is one of the reasons that Plum prides itself on using only Tier 1 telecom providers with our cloud IVR platform. We want you to have the best audio connection and call quality possible.

Safety Valves

There are many pieces that go to the phone connection puzzle. And it’s important to remember that not every caller will be doing so from a busy city street undergoing major construction.

Still, you want all your callers to have a great experience. So here are a few tips to help those callers who run into issues with ASR.

  1. Even if ASR is your first choice for your callers, make sure that you provide DTMF options.
  2. If the ASR is having trouble understanding a caller’s responses, don’t continually re-prompt them or disconnect the call. Set a reasonable re-prompt threshold and then switch to emphasizing DTMF input instead.
  3. Make your prompts non-bargable. A bargable prompt is one where callers can input their response before the prompt finishes. With ASR enabled, activating the barge-in feature at the same time can have a negative effect.

Fortunately, with a modern, feature-rich IVR development platform, like Plum Fuse, you don’t have to worry about wasting a ton of development time planning for these exceptions. Fuse makes it easy to manage ASR, giving you granular control over when and how it’s used in your IVR applications.

Schedule a demo of Plum Fuse today to learn more.

Copy link
Powered by Social Snap