The Future of Translation?

…continued from Translation with ASR

The implications of using speech recognition for translation in the interactive voice response, or IVR, world could be far-reaching. IVR systems and translation could go hand in hand.

Barbara Dragsted, Inge Gorm Hansen and Inger M. Mees from the Copenhagen Business School were curious to see whether an automatic speech recognition (ASR) program (vital to interactive voice response) could improve translation among the translation and interpretation students at their school.

The researchers asked students to translate various documents from Danish to English (the more common of directions, as English is a much more widely used language worldwide than Danish). They published their findings in a paper: “Speaking Your Translation: Students’ First Encounter with Speech Recognition Technology.”

Dragsted, Hansen and Mees recorded the audio output of the students speaking to the ASR and analyzed speed of translation, quality and the types of mistakes made. They found that the time for translation was reduced by the use of an ASR without hurting quality.

They also found that a majority of the misrecognitions of the ASR could be placed on the students. When you think about it, that’s not a real surprise, given the variables confronted by the ASR.

It’s not a surprise to someone who works in IVR, either. There’s a lot of nuance in understanding the spoken word that humans grasp automatically from years of experience.

…we looked into the number and types of error that occurred when using the SR software. Items that were misrecognised (sic) by the program could be divided into three categories: homophones [words that we pronounce the same and may spell the same, although they have different meanings], hesitations and incorrectly pronounced words. Well over fifty percent of the errors were caused by students’ mispronunciations.

I could see that. Think about all the variables between students’ different accents (think of a northern American accent versus a southern American accent), the way they pronounce certain words (even people from the same region may pronounce words differently) and mistakes.

In the end, translation with ASR was somewhere in the middle of spoken sight translation and written translation as far as time spent on the task (written the longest) and quality (written the highest quality). But the researchers feel that ranking could change.

We hypothesise (sic) that with more practice and training, SR time consumption will approach that of sight translation, and SR quality will approach that of written translation.

That interpretation points towards improvements in the translators as opposed to the speech recognition software, which has improved in leaps and bounds in the last few years, as evidenced in the countless IVR systems now using it effectively.

Share this with friends!twittergoogle_pluslinkedin