Inside IMA ADPCM

Dr. Dobb's Journal November 1997

By Tim Kientzle

Tim is senior technical editor for DDJ and author of the forthcoming Programmer's Guide to Sound (Addison-Wesley, 1998). He can be reached at kientzle@ddj.com.

Audio compression is a tricky business. Although the 10:1 or better compression offered by MPEG audio and Dolby AC-2 is impressive, the CPU requirements are, unfortunately, equally impressive. This is fine if you can afford to devote a high-speed CPU entirely to audio decompression, but that's not always an option. Frequently, you'll want to juggle animation or other CPU-intensive actions at the same time.

Intel's DVI video compression engine includes a method for audio compression that offers moderate compression with very low CPU requirements. This engine was later adopted by the Interactive Multimedia Association (IMA), and the audio algorithm has become widely known as "IMA ADPCM." In this article, I'll describe how IMA ADPCM works, and discuss two specific implementations of it, one by Microsoft and the other by Apple.

IMA ADPCM

IMA ADPCM stores four bits of information for each sample. The compressor takes the difference between two samples, divides it by the current step size, and uses that as the next four-bit compressed output. Conversely, the decompressor takes this four-bit value, multiplies it by the current step size, and adds the result to the previous sample to create the next sample.

To keep the amount of state information low, the step size is not stored directly. Rather, there is a table of 88 possible step sizes (Example 1), and the decompressor maintains an index into this table. These step sizes roughly follow an exponential curve. Thanks to this table, the total amount of state information is only 23 bits: seven bits for the index into the step size table and 16 bits for the current decompressed value. In embedded applications, it may be possible to keep the entire state in registers, requiring no RAM at all.

The important part of IMA ADPCM is how it adjusts the step size. Each four-bit nybble is a signed-magnitude value; the high-order bit is the sign, the remaining bits are the value from zero to seven. (Unlike the more familiar twos-complement format, signed magnitude format distinguishes between +0 and -0.)

Since the decompressor doesn't know the next sample, it has to choose a step size based on previous values. If the four-bit scaled difference is large, the next might be larger, overflowing the four-bit value. If the four-bit scaled difference is small, then you're not taking advantage of the four bits of information you have. As a result, the step size is decreased when the scaled difference gets closer to zero, and is increased when the scaled difference gets large. Example 2 shows the table used by the decompressor to update the step size. It uses the four-bit code as an index into this table, and adds the result to the current step size index.

Implementation

You now have the rough outline of the decoder; for each four-bit sample, multiply by the current step size, add the result to the previous output value to get the next output value, and finally, update the current step size.

There's one additional trick in the actual implementation: Rather than directly multiplying, a series of bit operations are used that are roughly -- but not quite -- the same. The advantage of this approach is that similar bit operations can be used in the compressor, and these operations are faster than a full integer division on many processors.

Listing One shows the full decompressor. You simply feed it a nybble at a time (IMA ADPCM specifies that the low-order nybble precedes the high-order nybble), and it returns the decoded sample.

Listing Two is the compressor. To simplify the implementation, ImaAdpcmEncode simply calls ImaAdpcmDecode directly to update the state information.

What IMA Forgot

Both Microsoft and Apple use IMA ADPCM. Microsoft uses it with WAVE files and Apple uses it with AIFF files. One of the key advantages to IMA ADPCM is that it requires so little state information. This allows you to simply dump the complete compressor state at the beginning of each packet. However, IMA failed to specify the length or format of such packets, and as a result the Microsoft and Apple codecs are completely incompatible.

Microsoft's IMA ADPCM Variant

Microsoft uses several fields in the WAVE file header to specify the length of packets, and uses a different packet format for mono and stereo sounds. The "Block Alignment" field specifies the number of bytes per packet. The "Number of Channels" field specifies the number of channels, hence, the precise packet format. In addition, WAVE IMA ADPCM files add two bytes of compressor-specific information, which are used to specify the number of samples stored in the packet. Note that the "Bits per Sample" field is always set to four.

Table 1 illustrates the mono packet's format. It begins with a four-byte header, followed by the compressed data. When you decompress a packet, you emit one sample for the header itself, then one sample for each nybble. For example, if the Block Alignment is 256, the packet will contain 505 samples (504 samples in 252 bytes plus one sample for the four-byte header).

Microsoft resets the decompressor at the beginning of each packet. Although the header has space for the complete 16-bit current sample value, the mono files I've inspected (compressed with the IMA ADPCM compressor included with Windows 95) always set the low-order byte to zero. (The stereo files do store the complete 16-bit current sample for each channel.)

A stereo packet looks very much like two separate mono packets that have been interleaved by alternating 32-bit words, as shown in Table 2. In particular, the extended field in the WAVE header contains the number of samples contained in a packet for each channel. For example, a packet size of 2048 bytes corresponds to 2041 samples in the WAVE header (one for the four-byte decompressor state plus 2040 samples from 1020 compressed bytes).

My implementation is contained in the class DecompressImaAdpcmMs (see Listing Three; the complete source code is available electronically; see "Availability," page 3). To use it, read the number of samples per packet and the number of channels from the WAVE header, and use that to create a DecompressImaAdpcmMs object. You can then call GetSamples on that object to read decompressed data. Internally, GetSamples calls the NextPacket method to read and decode the next packet whenever the two buffers are empty. NextPacket in turn calls ReadBytes to obtain additional data from the file; you'll need to implement ReadBytes appropriately.

Apple's IMA ADPCM Variant

Although Apple and Microsoft used the exact same low-level compression, their packet formats are quite different:

Apple uses the same packet format for either mono or stereo. Stereo files alternate one packet of left data with one packet of right data.
Microsoft emits one sample for the header, Apple does not. (A Microsoft packet always contains an odd number of samples. An Apple packet always contains an even number of samples.)
The Microsoft packet header uses eight bits for the step size index and stores eight or 16 bits of the current sample. Apple uses only seven bits for the step size index and stores the upper nine bits of the current sample.
Microsoft's decompressor completely resets at the start of each packet, Apple's does not.
The Microsoft packet header contains a zero byte for detecting synchronization errors. Apple does not provide a means for detecting these errors (apart from checking that the seven-bit index value falls in the range 0-88).
An Apple packet is always 34 bytes (two bytes of header and 64 compressed samples). Microsoft allows the compressor to specify the packet size; it stores the number of samples per packet in the file header.

With this laundry list in hand, it's not too hard to adapt my Microsoft IMA ADPCM compressor to handle Apple IMA ADPCM files. I've done so in my DecompressImaAdpcmApple class, which I use the same way as the DecompressImaAdpcmMs class.

As Listing Four shows, there's one interesting trick in implementing the Apple-compatible decompressor. Because the packet header only stores the upper nine bits of the current state, Apple's decompressor is careful to only reset the decompressor state when necessary. My implementation of NextPacket is equally careful: If the nine-bit value given matches the upper nine bits of my internal state, I keep my state, since it's more accurate.

Comparing Microsoft's and Apple's IMA ADPCM Codecs

From a coding point of view, Apple's implementation of IMA ADPCM is simpler, mostly because you don't need to worry about two different packet formats. It also provides slightly better quality; Microsoft throws away eight bits of decompressor state at the beginning of each mono packet, which results in a small jump in sample value at the start of each packet.

Microsoft's approach is more compact. Assuming 2048-byte stereo packets, Microsoft's approach incurs only eight bytes of overhead for each 2041 bytes of compressed data (less than 0.4 percent overhead), while Apple incurs two bytes for each 32 bytes of compressed data (over six percent overhead). It also provides lower latency; if stereo data is being compressed and decompressed simultaneously, Microsoft only incurs an eight-sample delay, while Apple must wait for a complete 64-sample packet before it can have stereo data available. However, this is not a very serious issue even at low sampling rates; at 11,025 samples per second, 64 samples is less than six milliseconds.

Conclusion

Although originally defined to be a platform-independent standard for audio-file compression, the IMA ADPCM format doesn't quite meet that requirement. The low-level compression is not a concern; it's independent of byte order and is defined using only low-level bit manipulations, which give it good performance even on processors that don't have fast integer division.

However, the differences between the Microsoft and Apple versions make it impossible to mix their codecs. You can't extract IMA ADPCM compressed data from an AIFF-C file and feed it to the Microsoft codec, nor can you feed IMA ADPCM WAVE data to the Apple codec. Fortunately, IMA ADPCM codecs are compact; the code I've provided here should allow you to easily support both common variants, and you should also be able to adapt this code to support any other IMA ADPCM variants you may encounter.

Acknowledgments

The code presented here is partially based on an implementation of Intel's DVI ADPCM algorithm developed by Jack Jansen at the Stichting Mathematisch Centrum, Amsterdam, The Netherlands.

Listing One

short ImaAdpcmDecode(unsigned char deltaCode, ImaState &state) {   // Get the current step size
   int step = _stepSizeTable[state.index];
   // Construct the difference by scaling the current step size
   // This is approximately: difference = (deltaCode+.5)*step/4
   int difference = step>>3;
   if ( deltaCode & 1 ) difference += step>>2;
   if ( deltaCode & 2 ) difference += step>>1;
   if ( deltaCode & 4 ) difference += step;
   if ( deltaCode & 8 ) difference = -difference;
   // Build the new sample
   state.previousValue += difference;
   if (state.previousValue > 32767) state.previousValue = 32767;
   else if (state.previousValue < -32768) state.previousValue = -32768;

   // Update the step for the next sample
   state.index += indexAdjustTable[deltaCode];
   if (state.index < 0) state.index = 0;
   else if (state.index > 88) state.index = 88;

   return state.previousValue;
}

Back to Article

Listing Two

unsigned char ImaAdpcmEncode(short sample, ImaState &state) {   int diff = sample - state.previousValue;
   int step = _stepSizeTable[state.index];
   int deltaCode = 0;

   // Set sign bit
   if (diff < 0) { deltaCode = 8; diff = -diff; }

   // This is essentially deltaCode = (diff<<2)/step,
   // except the roundoff is handled differently.
   if ( diff >= step ) {  deltaCode |= 4;  diff -= step;  }
   step >>= 1;
   if ( diff >= step ) {  deltaCode |= 2;  diff -= step;  }
   step >>= 1;


   if ( diff >= step ) {  deltaCode |= 1;  diff -= step;  }

   ImaAdpcmDecode(deltaCode,state);  // update state
   return deltaCode;
}

Back to Article

Listing Three

#ifndef IMAADPCM_H_INCLUDED#define IMAADPCM_H_INCLUDED

struct ImaState {
   int index;    // Index into step size table
   int previousValue; // Most recent sample value
};
// Decode/Encode a single sample and update state
short ImaAdpcmDecode(unsigned char deltaCode, ImaState &);
unsigned char ImaAdpcmEncode(short sample, ImaState &);
// Microsoft-compatible decoder
class DecompressImaAdpcmMs {
private:
   int  _channels;
   short *_samples[2];  // Left and right sample buffers
   short *_samplePtr[2]; // Pointers to current samples
   size_t   _samplesRemaining; // Samples remaining in each channel
   size_t   _samplesPerPacket; // Total samples per packet
public:
   DecompressImaAdpcmMs(int packetLength, int channels);
   ~DecompressImaAdpcmMs();
   size_t GetSamples(short *outBuff, size_t len);
private:
   unsigned char *_packet;   // Temporary buffer for packets
   size_t   _bytesPerPacket; // Size of a packet
   size_t  NextPacket();
   size_t ReadBytes(void *buffer, size_t request);
};
// Apple-compatible decoder
class DecompressImaAdpcmApple {
private:
   int _channels;
   ImaState _state[2];
   short _samples[2][64];
   short *_samplePtr[2];
   size_t   _samplesRemaining;
   size_t  NextPacket(short *sampleBuffer, ImaState &state);
public:
   DecompressImaAdpcmApple(int channels);
   size_t GetSamples(short *outBuff, size_t len);
   size_t ReadBytes(void *buffer, size_t request);
};
#endif

Back to Article

Listing Four

size_t  DecompressImaAdpcmApple::NextPacket(short *sampleBuffer, ImaState &state) {
   unsigned char _packet[34];
   // Pull in the packet and check the header
   size_t bytesRead = ReadBytes(_packet,34);
   if (bytesRead < 34) return 0;

   // Check the decompressor state
   state.index = _packet[1] & 0x7f;
   if (state.index > 88) {
      cerr << "Synchronization error.\n";
      exit(1);
   }
   // Recover upper nine bits of last sample
   short lastSample = 
      (static_cast<signed char>(_packet[0])*0x100 + 
       static_cast<signed char>(_packet[1])) & 0xff80;
   // If ours doesn't match, reset to the value in the file
   if ((state.previousValue & 0xff80) != lastSample)
      state.previousValue = lastSample;
   // Decompress nybbles
   for(int i=0; i<32; i++) {
      *sampleBuffer++ = ImaAdpcmDecode(_packet[i+2] & 0xf, state);
      *sampleBuffer++ = ImaAdpcmDecode((_packet[i+2]>>4) & 0xf, state);
   }
   return 64;
};

Back to Article


DDJ


Copyright © 1997, Dr. Dobb's Journal



Back to Table of Contents