Multimodal Speech Interpretation

In the IVR industry, ‘multimodal’ is a relatively new catchphrase. In a nutshell, it means communicating through more than just phone calls. And it matches the direction our culture is heading with texting and social media, et cetera.

Until the last few years, organizations using IVR systems were limited to phone calls or separate email for reaching their customers or employees or whatever. But now IVR systems support email, texting and social media.

That’s the outgoing IVR, though. What about incoming messages? Researchers from Ambedkar Marathwada University in India have been working on a multimodal interface for speech interpretation software.

That may sound like a conflict (multimodal implies more than just speech interpretation), but it’s more of adding capabilities to speech interpretation for other forms of communication.

In Online Multimodal Interaction for Speech Interpretation, researchers Vaishali Ingle and Aditi Deshpande describe their work with Extensible Multimodal
Annotation markup language (EMMA) “to provide semantic interpretations for speech, natural language text, keyboard and ink input (a type of stylus input that includes handwriting recognition).”

Basically, EMMA enhances speech interpretation technology by reading alternative input methods and interpreting those alternative inputs for the speech interpretation software.

From the Indian researchers’ paper:

In this paper, we describe an implementation of multimodal interaction for speech interpretation to enable access to the web. As per [a] W3C recommendation on 10th February 2009, the latest version of EMMA is used for translation of speech signals into a format interpreted by the application language, greatly simplifying the process of adding multiple modes to an application.

As with IVR, multiple means of communicating with software (talking, typing, writing, et cetera) enhance the communication process.

Share this with friends!twittergoogle_pluslinkedin