Dr. Dobb's Journal April 1998

## Algorithms for audio compression

### By Gary D. Knott

Gary works at Civilized Software and can be contacted at knott@civilized.com.

One common method of data compression is delta modulation, a technique that's typically used for sound and/or video digital radio transmissions and recordings as well as other types of signals. Delta modulation is a data-encoding technique that also compresses the sequence of sampled function values being encoded. It is particularly useful for transmitting fewer bits (over a network, for instance), and for archival storage. In this article, I'll review the basic delta-modulation algorithm, then examine variations that implement "adaptive delta modulation." For an assembly-language implementation of delta modulation, see "Audio Compression," by John W. Ratcliff (DDJ, July 1992).

The idea behind the delta modulation representation of a signal f is quite simple. Given the initial value, h=f(s0), a sampling interval , and an increment value , you can interpret a string of bits b1, b2,...bn to obtain a specification of estimated signal values (t1), (t2),...(tn), where ti=s0+i for 0in, as follows: Each bit bi indicates adding the constant if bi=1 and subtracting if bi=0 so that the equation in Example 1 holds. For example, consider f(t)=sin(t)+2cos(3t)exp(-t) if t6; and f(t)=-.27614(1-exp(-2(t-6))) otherwise. Figure 1 shows a graph of f on the interval [0,11].

Let s1=tn. Choosing the parameter values s0=0, s1=11, =0.1, and =0.1, you have h=2 and the delta modulation representation is the binary digital sequence

00000000000000111111000000000000000000
10000101010101111011111101010010101010
101010101010101010101010101010101.

As Figure 2 illustrates, this sequence specifies drawn with the traditional step-function interpolation that would be seen on an oscilloscope. The smooth curve is the true signal, and the stepped curve is the step-function form of . Thus the function on the interval 0 to 11 is represented with 110 bits.

The encoding process computes the ith bit, bi, by predicting f(ti) to be some value Pi. Then, bi is taken as 1 if f(ti)>Pi and 0 otherwise. The choice of Pi used here is merely the previous estimate value (ti-1).

In general, to obtain a reasonable estimate for f, the sampling interval, , must be such that (s1-s0)/ is greater than the Nyquist frequency, which is twice the frequency of the highest-frequency component present in the signal f, and must be small enough to track high-frequency oscillations without undue shape distortion. A practical way to choose and is to choose as the largest absolute error by which may deviate from f, then choose =/W, where W=maxs0ts1|df(t)/dt|, the maximum absolute is slope of f. In Figure 2, is clearly too large for the chosen ; the result is the large error in the initial part of the signal, called "slope overload error," where the slope is too large in magnitude for =0.1 and =0.1 to track the signal.

Even when the signal is being appropriately followed, the estimate oscillates about it. This "granular" noise is unavoidable, although its size is controlled by . Note that the error characteristics of the estimator are given by |f(t)-(t)|< for s0ts1, assuming a is small enough. This is an absolute error criterion rather than a relative error criterion, and behaves like a Chebychev approximation to f.

A delta modulation signal is sensitive to transmission error. Changing a burst of a dozen bits or so during transmission can destroy the validity of the remaining bits. However, higher sampling rates mean short burst errors are less harmful, and methods to periodically restart the delta-modulation process can be included in a practical transmission system. In general, delta modulation is a very efficient way to encode a signal. It is not clear how to define the notion of the efficiency of an approximation (as opposed to an exact encoding) in a precise information-theoretic manner, but this is an intriguing direction for investigation.

### Extending Delta Modulation

You can extend the idea of basic delta modulation in several ways. One approach is to let the increment assume various values, depending upon the past tracking of the signal. If you output m 1s or 0s in a row (indicating a region of large absolute slope), you can increase , replacing with 3/2, for example. If you output m alternating 1s and 0s, you can then decrease , say, to 2/3. The new value of applies to the current bit being output, which forms the mth bit of the change-triggering pattern. This approach is called "adaptive delta modulation." Changing , however, is not always an improvement. Indeed, the signal may be such that a closely tracking, but lagging, estimate becomes an oscillating estimate with greater absolute error when adaptive delta modulation is employed. For the signal in Figure 2 with =0.1 (too large), and varying within the limits 0.05 to 0.28 based on m=2, so that two zeros or ones in a row increase , while two different values decrease -- you obtain the approximation in Figure 3.

Another approach is to allow the sampling interval, , to change. This is not very useful for hardware, which is more conveniently designed to use a fixed clock rate, but for data compression for digital storage purposes, varying may allow fewer bits to be used to encode a given curve. You can increase when the previous m bits have alternated in value, but when m 1s or 0s in a row occur, you reduce to reestablish the fastest sampling rate. This permits large steps in slowly varying regions, but it allows relatively large deviations in the estimate to occur at turning points where f changes from being flat to sharply rising or falling. Choosing m=2 minimizes this effect, but it is still noticeable. Lagging tracking at turning points is the major flaw in most delta-modulation schemes. The step-function estimate of the signal is shown in Figure 4, where I replaced by 1.6 up to a limit of 0.41 whenever the previous two bits were the same, and reset to 0.05 otherwise. I fixed =0.1 (which is too small to be completely reasonable for the range of ). I now have as estimated points: (s0,f(s0)), (t1, (t1)),...,(tn, (tn)) for some irregular mesh s0<t1<...<tn. If you allow and to both vary as described, with in the interval [0.05,0.28] and in the interval [0.05,0.41], you obtain the approximation in Figure 5.

To compute the bit bi, which determines the point (ti, (ti)) when encoding the signal, f, you form an estimate of f(ti), called "Pi," where Pi predicts f(ti), given the previous estimate points (t0, (t0)), (t1, (t1)),...,(ti-1, (ti-1)). Then if Pi is less than f(ti), you output bi=1; otherwise, for Pif(ti), bi is output as 0.

This same predictor must be used in decoding the bitstring, b, to compute ; this is why Pi depends on values, and not on f-values. In this discussion, I've used the simple predictor Pi=(ti-1). Other predictor schemes are possible and may provide better performance, allowing smaller and/or larger values to be used. Of course, other predictors do not necessarily have the property that bi=1 iff (ti-1)<f(ti).

In general, then, the decoding calculation for obtaining (ti) is (ti)=Pi+i(2bi-1) for 1in, where i is the particular value of the increment used when bi is computed; i is a function of b1,b2,...,bi-1.

You could choose Pi to be determined by extrapolation on the linear or quadratic curve passing through the previous two or three points of the estimate, but experimentation shows the error in this form of predictor to be worse than the simple constant form. A more promising approach is to use an extrapolation on a polynomial that fits the previous k estimate points in the least squares (or better yet, L1-norm) sense. This works adequately, although the predictor is slow to track f on turning intervals. A more elaborate filter predictor might be worth exploring, but weighted averages of the previous k points and weighted averages of the linear predictors based on the k previous points taken two at a time also perform no better than the simple constant predictor. Thus, finding a better predictor seems difficult and the constant form seems to be the most practical choice as well as the simplest.

If a particularly good predictor is used, however, the adaptive variation of and becomes less useful. Indeed, with a perfect predictor Pi=f(ti), the output is all 0 bits, and will stay at its minimum, while stays at its maximum. The resulting curve tracks below f with the maximum allowable error. Even using a merely good predictor means you should sharply decrease and perhaps increase .

Examples 2(a) and 2(b) present algorithms for encoding and decoding an adaptive delta-modulation signal with m=2 and with the simple constant predictor function, Pi=(ti-1), previously used. The constants C, min, max, g, min, and max that appear there are global parameters that may be "tuned" to the situation. I have used C=1.5, min=0.05, max=0.28, g=1.6, min=0.05, and max=0.41 in the examples.

There are many variations possible, based on different ranges for and , and different formulas for changing them. For example signal f, I actually do about as well with even fewer bits than used above when I let assume just the values 0.1 and 0.2 and let assume just the values 0.1 and 0.2. Another idea is to compute b by the recipe: if f(t)>P-a(log(1+|f1(t)|)/k) then b<-1 else b<-0. This use of slope information can perhaps be marginally useful, even though it produces "lies" about the location of f. Some suggestions have been made that an "intelligent" encoder could hold m signal values in a memory, and carefully compute the best block of one or more output bits based on this look ahead. Perhaps just provisional output bits could be held, so that you would hold m bits in a local memory and output each bit based upon the m-1 future bits that have been provisionally computed and stored, but it seems difficult to make a useful scheme out of this idea.

Also, when you use m>2 to adaptively change and/or , you could use a 2m-bit decision tree to pick carefully chosen and modifications; this scheme does work well, but at a high cost in complexity.

The graphs presented here were produced with the Modeling Laboratory (MLAB), a program from Civilized Software for mathematical and statistical modeling (see http://www.civilized.com/). Originally developed at the National Institutes of Health, MLAB includes curve fitting, differential equations, statistics, and graphics as some of its major capabilities. Listing One includes the statements required to do such computer experiments with MLAB. When invoked with an appropriate MLAB Do command, Listing One produces matrices suitable for graphing.

It is worth noting that the step-function approximation form of drawing is somewhat deceiving. A simple straight-line interpolation is a more reasonable presentation. For example, the (, )-varying estimate shown in Figure 5 is generated again in Figure 6 using linear connecting segments. Viewing a picture such as this suggests that you might estimate f more exactly if you compute the midpoints of the line segments in the graph of f that cross the graph of . But this set of crossing points is only marginally useful when filtering is used. Generally, when possible, the input, f, should be prefiltered to pass only frequencies below an appropriate cutoff point. In any event, the output points, (t,(t)), have both sampling and encoding error, and the output should be filtered to remove some of this noise. The filtering can be done with a distance-weighted smoothing transformation in software, or with an appropriate low-pass filter in hardware.

Figure 7 shows the smoothed variable and estimate. A doubly smoothed estimate would be even better in regions of slowly changing slope.

DDJ

#### Listing One

```FUNCTION F(T)=IF T<=6 THEN SIN(T)+2*COS(3*T)*EXP(-T) ELSE = J-J*EXP(-2*(T-6));
J = -.27614; MINA = .05; MAXA = .41; MAXD = .28; MIND = .05;
T0 = 0; T1 = 11; D = MIND; A = MINA; G = 1.6; C = 1.5;
TYPE "SET T0,T1,A,D, ETC. AS DESIRED."; DO CONSOLE;

(T:=T+A))))+0*(A:=NEWA(X1:=X2,X2:=B))+0*(D:=NEWD()) ELSE 1000-I;

FUNCTION OLDP(B)=PV+D*(2*B-1);
FUNCTION P(H1)=H1;

FUNCTION NEWA(X1,X2)=IF X1=X2 THEN MINA ELSE MIN(G*A,MAXA);
FUNCTION NEWD()=IF X1=X2 THEN MIN(D*C,MAXD) ELSE MAX(D/C,MIND);

X2 = 1; ADM =.5; T = T0; IF T1 <= T0 THEN TYPE ("null interval"); = PV = F(T0);
"PRE-ALLOCATE THE ARRAYS MT, ME.";
MT = 0^^360; ME[360] = 0; MT[1] = T0;MB = ADM ON 1:360;

N = MAXI(MB); IF N >=360 THEN TYPE "TOO MANY POINTS.";

ME(N) = OLDP(B); ME = ME ROW 1:N; MT = MT ROW 1:N; MB = MB ROW = 1:(N-1);
SME = MESH(MT,MT)&'ROTATE(MESH(ME,ME),1); DELETE SME ROW (1,N+N);

ME = MT&'ME; MF = POINTS(F,MT);
"MB IS THE DELTA-MODULATION BIT-VECTOR, MF IS THE SAMPLED SIGNAL POINTS, ME IS
THE ESTIMATED SIGNAL POINTS, AND SME IS THE STEP-FUNCTION ESTIMATE.";

```