Aug18-IVR and Audio Transcription Main Image

IVR and Audio Transcription: Find the Method That’s Right for You

Quantitative data makes the business world go ‘round and qualitative data provides the nuance and shading to help contextualize all those numbers. But qualitative data can be difficult to obtain and analyze. Using IVR to record and transcribe the customer responses is one way to get qualitative data.

There are two different methods that we utilize here at Plum for transcribing audio from voice recordings. Each method has different benefits and levels of complexity and they’re intended for different purposes.

Hybrid Transcription

Hybrid transcription combines machine-based and human-based transcription. Essentially what happens here is that your IVR app sends the recorded audio to a speech recognition engine that attempts to figure out what was said. The engine flags unknown or questionable words and generates an accuracy score. If the accuracy is high enough it returns the results to the app. If the results are too low, the engine sends the file to a human to double check before returning the result to the app.

The advantage to this method is that it’s less expensive and time consuming compared to purely human-based transcription. It tends to be pretty reliable and returns results rather quickly, typically within minutes.

The downside is that the transcription workflows are complex and can be time consuming to develop. The nature of the process means that there’s a delay getting the results. Often Plum Insight customers are interested in transcribing open-ended survey responses. If the survey does not need transcription, then the survey results are sent immediately. If the survey uses transcription, then the application waits until it gets the transcription and sends that with the other survey results at the same time.

So, a lot depends on what data you need, how you’re using that data, and when you need it.

Real-Time Transcription

Real-time transcription is an alternative option. The inner workings of real-time transcription are more straightforward than hybrid transcription. With the real-time transcription method, the captured audio is sent to a natural language processing engine that can determine intent and return that information to the app quickly. Whereas the hybrid method can take several minutes to complete, the real-time method only takes a couple seconds.

The advantage to real-time transcription is obviously speed: it delivers results much quicker. This type of transcription works best for open-ended speech recognition where you don’t need to act on what the caller said, but you do want to store it as part of your call-flow. It also works well with artificial intelligence applications that determine caller intent.

For example, if a health care company conducts a medical survey and one question wants to assess patients’ emotional state it could use real-time transcription to get a general sense of how they’re doing and then move on to the next question.


Depending on the speed and accuracy you need from audio transcription, Plum Voice offers two different options: hybrid and real-time. It’s worth noting that both methods have issues with proper nouns – names, streets, cities, etc. That’s because these items often lack context, so they’re not built into most grammars, which affects transcription accuracy.

If you have questions about audio transcription or which method is right for your needs, contact us today.