Adaptive Delta Modulation

Dr. Dobb's Journal April 1998

Algorithms for audio compression

By Gary D. Knott

Gary works at Civilized Software and can be contacted at knott@civilized.com.

One common method of data compression is delta modulation, a technique that's typically used for sound and/or video digital radio transmissions and recordings as well as other types of signals. Delta modulation is a data-encoding technique that also compresses the sequence of sampled function values being encoded. It is particularly useful for transmitting fewer bits (over a network, for instance), and for archival storage. In this article, I'll review the basic delta-modulation algorithm, then examine variations that implement "adaptive delta modulation." For an assembly-language implementation of delta modulation, see "Audio Compression," by John W. Ratcliff (DDJ, July 1992).

The idea behind the delta modulation representation of a signal f is quite simple. Given the initial value, h=f(s0), a sampling interval [alpha], and an increment value [delta], you can interpret a string of bits b1, b2,...bn to obtain a specification of estimated signal values [f with hat](t1), [f with hat](t2),...[f with hat](tn), where ti=s0+i[alpha] for 0[less than or equal to]i[less than or equal to]n, as follows: Each bit bi indicates adding the constant [delta] if bi=1 and subtracting [delta] if bi=0 so that the equation in Example 1 holds. For example, consider f(t)=sin(t)+2cos(3t)exp(-t) if t[less than or equal to]6; and f(t)=-.27614(1-exp(-2(t-6))) otherwise. Figure 1 shows a graph of f on the interval [0,11].

Let s1=tn. Choosing the parameter values s0=0, s1=11, [delta]=0.1, and [alpha]=0.1, you have h=2 and the delta modulation representation is the binary digital sequence

00000000000000111111000000000000000000
10000101010101111011111101010010101010
101010101010101010101010101010101.

As Figure 2 illustrates, this sequence specifies [f with hat] drawn with the traditional step-function interpolation that would be seen on an oscilloscope. The smooth curve is the true signal, and the stepped curve is the step-function form of [f with hat]. Thus the function [f with hat] on the interval 0 to 11 is represented with 110 bits.

The encoding process computes the ith bit, bi, by predicting f(ti) to be some value Pi. Then, bi is taken as 1 if f(ti)>Pi and 0 otherwise. The choice of Pi used here is merely the previous estimate value [f with hat](ti-1).

In general, to obtain a reasonable estimate for f, the sampling interval, [alpha], must be such that (s1-s0)/[alpha] is greater than the Nyquist frequency, which is twice the frequency of the highest-frequency component present in the signal f, and [delta] must be small enough to track high-frequency oscillations without undue shape distortion. A practical way to choose [alpha] and [delta] is to choose [delta] as the largest absolute error by which [f with hat] may deviate from f, then choose [alpha]=[delta]/W, where W=maxs0[less than or equal to]t[less than or equal to]s1|df(t)/dt|, the maximum absolute is slope of f. In Figure 2, [alpha] is clearly too large for the chosen [delta]; the result is the large error in the initial part of the signal, called "slope overload error," where the slope is too large in magnitude for [delta]=0.1 and [alpha]=0.1 to track the signal.

Even when the signal is being appropriately followed, the estimate oscillates about it. This "granular" noise is unavoidable, although its size is controlled by [delta]. Note that the error characteristics of the estimator are given by |f(t)-[f with hat](t)|<[delta] for s0[less than or equal to]t[less than or equal to]s1, assuming a is small enough. This is an absolute error criterion rather than a relative error criterion, and [f with hat] behaves like a Chebychev approximation to f.

A delta modulation signal is sensitive to transmission error. Changing a burst of a dozen bits or so during transmission can destroy the validity of the remaining bits. However, higher sampling rates mean short burst errors are less harmful, and methods to periodically restart the delta-modulation process can be included in a practical transmission system. In general, delta modulation is a very efficient way to encode a signal. It is not clear how to define the notion of the efficiency of an approximation (as opposed to an exact encoding) in a precise information-theoretic manner, but this is an intriguing direction for investigation.

Extending Delta Modulation

You can extend the idea of basic delta modulation in several ways. One approach is to let the increment [delta] assume various values, depending upon the past tracking of the signal. If you output m 1s or 0s in a row (indicating a region of large absolute slope), you can increase [delta], replacing [delta] with 3[delta]/2, for example. If you output m alternating 1s and 0s, you can then decrease [delta], say, to 2[delta]/3. The new value of [delta] applies to the current bit being output, which forms the mth bit of the change-triggering pattern. This approach is called "adaptive delta modulation." Changing [delta], however, is not always an improvement. Indeed, the signal may be such that a closely tracking, but lagging, estimate becomes an oscillating estimate with greater absolute error when adaptive delta modulation is employed. For the signal in Figure 2 with [alpha]=0.1 (too large), and [delta] varying within the limits 0.05 to 0.28 based on m=2, so that two zeros or ones in a row increase [delta], while two different values decrease [delta] -- you obtain the approximation in Figure 3.

Another approach is to allow the sampling interval, [alpha], to change. This is not very useful for hardware, which is more conveniently designed to use a fixed clock rate, but for data compression for digital storage purposes, varying [alpha] may allow fewer bits to be used to encode a given curve. You can increase [alpha] when the previous m bits have alternated in value, but when m 1s or 0s in a row occur, you reduce [alpha] to reestablish the fastest sampling rate. This permits large steps in slowly varying regions, but it allows relatively large deviations in the estimate to occur at turning points where f changes from being flat to sharply rising or falling. Choosing m=2 minimizes this effect, but it is still noticeable. Lagging tracking at turning points is the major flaw in most delta-modulation schemes. The step-function estimate of the signal is shown in Figure 4, where I replaced [alpha] by 1.6[alpha] up to a limit of 0.41 whenever the previous two bits were the same, and reset [alpha] to 0.05 otherwise. I fixed [delta]=0.1 (which is too small to be completely reasonable for the range of [alpha]). I now have as estimated points: (s0,f(s0)), (t1, [f with hat](t1)),...,(tn, [f with hat](tn)) for some irregular mesh s0<t1<...<tn. If you allow [delta] and [alpha] to both vary as described, with [delta] in the interval [0.05,0.28] and [alpha] in the interval [0.05,0.41], you obtain the approximation in Figure 5.

To compute the bit bi, which determines the point (ti, [f with hat](ti)) when encoding the signal, f, you form an estimate of f(ti), called "Pi," where Pi predicts f(ti), given the previous estimate points (t0, [f with hat](t0)), (t1, [f with hat](t1)),...,(ti-1, [f with hat](ti-1)). Then if Pi is less than f(ti), you output bi=1; otherwise, for Pi[greater than or equal to]f(ti), bi is output as 0.

This same predictor must be used in decoding the bitstring, b, to compute [f with hat]; this is why Pi depends on [f with hat] values, and not on f-values. In this discussion, I've used the simple predictor Pi=[f with hat](ti-1). Other predictor schemes are possible and may provide better performance, allowing smaller [delta] and/or larger [alpha] values to be used. Of course, other predictors do not necessarily have the property that bi=1 iff [f with hat](ti-1)<f(ti).

In general, then, the decoding calculation for obtaining [f with hat](ti) is [f with hat](ti)=Pi+[delta]i(2bi-1) for 1[less than or equal to]i[less than or equal to]n, where [delta]i is the particular value of the increment used when bi is computed; [delta]i is a function of b1,b2,...,bi-1.

You could choose Pi to be determined by extrapolation on the linear or quadratic curve passing through the previous two or three points of the estimate, but experimentation shows the error in this form of predictor to be worse than the simple constant form. A more promising approach is to use an extrapolation on a polynomial that fits the previous k estimate points in the least squares (or better yet, L1-norm) sense. This works adequately, although the predictor is slow to track f on turning intervals. A more elaborate filter predictor might be worth exploring, but weighted averages of the previous k points and weighted averages of the linear predictors based on the k previous points taken two at a time also perform no better than the simple constant predictor. Thus, finding a better predictor seems difficult and the constant form seems to be the most practical choice as well as the simplest.

If a particularly good predictor is used, however, the adaptive variation of [delta] and [alpha] becomes less useful. Indeed, with a perfect predictor Pi=f(ti), the output is all 0 bits, and [alpha] will stay at its minimum, while [delta] stays at its maximum. The resulting [f with hat] curve tracks below f with the maximum allowable error. Even using a merely good predictor means you should sharply decrease [delta] and perhaps increase [alpha].

Examples 2(a) and 2(b) present algorithms for encoding and decoding an adaptive delta-modulation signal with m=2 and with the simple constant predictor function, Pi=[f with hat](ti-1), previously used. The constants C, [delta]min, [delta]max, g, [alpha]min, and [alpha]max that appear there are global parameters that may be "tuned" to the situation. I have used C=1.5, [delta]min=0.05, [delta]max=0.28, g=1.6, [alpha]min=0.05, and [alpha]max=0.41 in the examples.

There are many variations possible, based on different ranges for [delta] and [alpha], and different formulas for changing them. For example signal f, I actually do about as well with even fewer bits than used above when I let [delta] assume just the values 0.1 and 0.2 and let [alpha] assume just the values 0.1 and 0.2. Another idea is to compute b by the recipe: if f(t)>P-a(log(1+|f1(t)|)/k) then b<-1 else b<-0. This use of slope information can perhaps be marginally useful, even though it produces "lies" about the location of f. Some suggestions have been made that an "intelligent" encoder could hold m signal values in a memory, and carefully compute the best block of one or more output bits based on this look ahead. Perhaps just provisional output bits could be held, so that you would hold m bits in a local memory and output each bit based upon the m-1 future bits that have been provisionally computed and stored, but it seems difficult to make a useful scheme out of this idea.

Also, when you use m>2 to adaptively change [delta] and/or [alpha], you could use a 2m-bit decision tree to pick carefully chosen [delta] and [alpha] modifications; this scheme does work well, but at a high cost in complexity.

The graphs presented here were produced with the Modeling Laboratory (MLAB), a program from Civilized Software for mathematical and statistical modeling (see http://www.civilized.com/). Originally developed at the National Institutes of Health, MLAB includes curve fitting, differential equations, statistics, and graphics as some of its major capabilities. Listing One includes the statements required to do such computer experiments with MLAB. When invoked with an appropriate MLAB Do command, Listing One produces matrices suitable for graphing.

It is worth noting that the step-function approximation form of drawing [f with hat] is somewhat deceiving. A simple straight-line interpolation is a more reasonable presentation. For example, the ([delta], [alpha])-varying estimate shown in Figure 5 is generated again in Figure 6 using linear connecting segments. Viewing a picture such as this suggests that you might estimate f more exactly if you compute the midpoints of the line segments in the graph of f that cross the graph of [f with hat]. But this set of crossing points is only marginally useful when filtering is used. Generally, when possible, the input, f, should be prefiltered to pass only frequencies below an appropriate cutoff point. In any event, the output points, (t,[f with hat](t)), have both sampling and encoding error, and the output should be filtered to remove some of this noise. The filtering can be done with a distance-weighted smoothing transformation in software, or with an appropriate low-pass filter in hardware.

Figure 7 shows the smoothed variable [delta] and [alpha] estimate. A doubly smoothed estimate would be even better in regions of slowly changing slope.

DDJ

Listing One

FUNCTION F(T)=IF T<=6 THEN SIN(T)+2*COS(3*T)*EXP(-T) ELSE = J-J*EXP(-2*(T-6));
J = -.27614; MINA = .05; MAXA = .41; MAXD = .28; MIND = .05;
T0 = 0; T1 = 11; D = MIND; A = MINA; G = 1.6; C = 1.5;
TYPE "SET T0,T1,A,D, ETC. AS DESIRED."; DO CONSOLE;

FUNCTION ADM(I)= IF T+A<=T1 THEN (B:=((PV:=P(ME[I]:=OLDP(ADM)))<=F(MT[I+1]:= (T:=T+A))))+0*(A:=NEWA(X1:=X2,X2:=B))+0*(D:=NEWD()) ELSE 1000-I;

FUNCTION OLDP(B)=PV+D*(2*B-1); FUNCTION P(H1)=H1;

FUNCTION NEWA(X1,X2)=IF X1=X2 THEN MINA ELSE MIN(G*A,MAXA); FUNCTION NEWD()=IF X1=X2 THEN MIN(D*C,MAXD) ELSE MAX(D/C,MIND);

X2 = 1; ADM =.5; T = T0; IF T1 <= T0 THEN TYPE ("null interval"); = PV = F(T0); "PRE-ALLOCATE THE ARRAYS MT, ME."; MT = 0^^360; ME[360] = 0; MT[1] = T0;MB = ADM ON 1:360;

N = MAXI(MB); IF N >=360 THEN TYPE "TOO MANY POINTS.";

ME(N) = OLDP(B); ME = ME ROW 1:N; MT = MT ROW 1:N; MB = MB ROW = 1:(N-1); SME = MESH(MT,MT)&'ROTATE(MESH(ME,ME),1); DELETE SME ROW (1,N+N);

ME = MT&'ME; MF = POINTS(F,MT); "MB IS THE DELTA-MODULATION BIT-VECTOR, MF IS THE SAMPLED SIGNAL POINTS, ME IS THE ESTIMATED SIGNAL POINTS, AND SME IS THE STEP-FUNCTION ESTIMATE.";

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Back to Table of Contents