Coming up with a name for each emotion (like happy or sad) is even hard, but when you get into different levels of joy without a solid measuring system, things get more complicated than you even want to know.
For those of us working in industries as focused on precision and accuracy as IVR, categorizing emotion is a nightmare. After overcoming the hurdle of breaking down human emotion into a handful of variables, we have a few more factors to consider before we’re ready to teach IVR systems what to look for.
Sezgin and his team write:
“The existence of different contents, genders, speakers and speaking styles raise complications because these properties have direct affect on the features such as pitch and energy contours.”
With all of these considerations and many others in place, it seems nearly impossible that anyone would even come close to creating a prototype, let alone any sort of emotion-recognizing software that worked.
Somehow, though, Sezgin and his team figured it out.
Most emotion detection systems are built into speech-rec software that focuses on every single syllable. That’s because if you’re trying to understand what someone is saying, you need to understand pretty much every word.
With emotions, though, it’s more important to focus on overall tone, so Sezgin, Gunsel and Kurt chose to focus on a more comprehensive perceptual feature set.
Under this premise, they were able to create a program that was 7-16% better than the leading state-of-the-art systems today. Now that may not seem like a huge margin of improvement, but when you consider how new emotion recognition really is, it’s a good start.