Speech recognition is cool. I’m still saying that after working here at Plum for 8 years. It’s cool that the IVR can listen for any US city-state combination being spoken and accurately and reliably recognize it. It’s cool that the IVR can listen for 90% of the given names used in the US. And it’s cool that I can drive an IVR app with just my voice while…um…driving.
But just because something’s cool doesn’t mean it should be used everywhere. Speech rec has been around long enough that IVR designers should know better by now but, alas, that’s not the case in practice. The worst possible example of this is using speech recognition for any IVR application that will be called from a noisy, talky environment: several airlines use ASR in their flight status lines. It’s a strikingly ill-advised decision to rely on speech recognition in an environment like an airport concourse. Even when I’m connected to a live agent while at an airport, I have trouble hearing them and vice versa. Replace that human agent with an ASR engine and you’ll discover there’s nothing more irritating than hearing the airline IVR say, “please say your destination city?” and then having the airport PA system say “San Francisco” loud enough for the airline IVR to hear and accept.
How do most of the airlines solve this problem? They rely on the callers getting frustrated and hitting “0″ to get a human being on the line. But if you’re going to do that, why even bother with an IVR in the first place? Now, instead of having customers calling their agents directly and costing them money, they’re irking their customers first and then transferring them to an agent and still costing them money. It’s a lose-lose situation.
So what’s my solution?
- Don’t use speech recognition, instead use DTMF when your callers are likely to be in noisy environments where accuracy is required. People are pretty used to text-messaging now, so if they have to spell “Boston” on their telephone keypad, they won’t be befuddled.
- If you absolutely want speech rec in an IVR app that’s to be often called from a noisy environment, design your IVR app to be modal: switching from voice-and-dtmf mode to dtmf-only mode if it seems the caller’s speech input is consistently erroneous.
