Last week I talked about audio encoding formats but did not address how the decoder knows how the encoder encoded the audio (try saying that ten times quickly!)
There are really only two ways to address this issue. Method one is encapsulating the data with the encoding descriptors. Method two is…to guess.
The only encapsulation file format supported by the Plum IVR is Microsoft .wav which is derived from a format called RIFF. People often think that Microsoft .wav is both a file format and an audio encoding format. It isn’t. .wav/RIFF is independent of the audio encoding. Without getting into too much detail, you can think of the .wav/RIFF format as merely an envelope; the data enclosed within the envelope can be encoded any number of ways from PCM or u-law (as mentioned last week) to MP3 to various proprietary audio encoding formats. Thus, it’s important if you are going to create a .wav file that you also make sure that the audio is encoded using one of the formats mentioned last week.
That all said, you could also just send the IVR raw audio data and have the IVR guess at the format. You do, however, have to give the IVR a bit of hint in the form of an appropriate file name extension. If you encoded your audio data as 8kHz 16-bit PCM mono, just slap a “.pcm” on the end of the filename and the IVR will assume that’s the format. On the other hand, if you recorded your audio data as 8kHz 8-bit u-law mono, add “.ul” to the end of your filename. These types of files are often referred to as “raw, headerless” files because there’s no metadata whatsoever in the file — it’s all pure audio data. The downside to this is that there’s nothing to stop you from recording 11kHz 8-bit PCM stereo but still naming the file “whatever.pcm”. The IVR will load it, assume the data is another encoding format, and produce some noisy garbage over your phone lines.
One final thing to mention are MP3s. The Plum IVR can handle MP3s just fine, however, we often hear complaints about the decline in audio quality between what someone hears when their MP3 is played over their headphones and what is ultimately heard over the phone. Bear in mind: the phone system was never intended to transmit high-fidelity audio. That’s why we usually recommend the lossless formats instead because ultimate sound quality can be better controlled by the application developer when what he or she hears through headphones closely matches what they would hear over the phone.
So what would I recommend as the audio encoding format and file encapsulation format? We usually recommend .wav encapsulation of a 16-bit linear PCM, 8kHz, mono audio file. A) the file is self-describing, and B) “16-bit linear PCM” is common to all audio production software. Ideally we’d prefer to recommend u-law instead of 16-bit linear, but u-law often confuses people because it’s sometimes referred to as “mu-law” or sometimes “μ-law”. As usual, our support forum at http://support.plumgroup.com/ is always there to help you work out any audio production issues you might have.
Leave a Reply