*Iwan is a hardware and software developer in the department of medical bio-engineering at the Academic Medical Center in Amsterdam, Holland.*

In this month's "Algorithm Alley," Iwan describes two implementations of a radix 2 FFT algorithm, one written in C and the other in assembler. The assembler version can be linked to C or Pascal programs and can run a 1024-point FFT on a 486/66 in 36 msec, or in just 12 msec on a 60-MHz Pentium. This performance is comparable to what you can get from a dedicated digital signal processing (DSP) board--a 1024-point FFT in 1--4 msec--and is much cheaper. Figure 1 provides various performance results on a variety of processors.

When computers are involved in signal processing, an analog-to-digital converter must be used to digitize the incoming analog signal. This results in an array of samples representing the original analog signal. It is possible to perform a Discrete Fourier Transform (DFT) on the array of samples resulting in the discrete-density spectrum in Example 1, where *f _{n}* is the

When computing an N-point DFT, N2 multiplications are required. Some of those factors appear several times throughout the calculation. Using a smarter computation method called the "Fast Fourier Transform" (FFT), it is possible to skip duplicate calculations, saving a lot of execution time.

The secret behind the FFT algorithm is splitting an N-point DFT into two N/2-point DFTs. Computing an N/2-point DFT takes N^{2}/4 multiplications, so two N/2-point DFTs can be calculated with N2/2 multiplications. Note that this is half the number of calculations required to perform an N-point DFT. While some combining of the N/2-point DFTs is required to obtain the original N-point DFT, computation time is reduced almost by a factor of 2. By continuing this halving process, finally 2-point DFTs remain to be calculated. When the combination method is disregarded, FFT execution time is the result of the equation in Example 2. F(k) can be written as a combination of two N/2 point DFTs, F_{1}(k) and F_{2}(k), holding the even and odd samples of F(k), respectively; see Example 3. Note that the sequence of Q-factors is the same for both equations of F(k). This characteristic is used to reduce execution time. The equation for finding F(k) for one Q-factor is graphically represented by the so-called FFT "Butterfly" in Figure 2.

The computation of an N-point FFT can be graphically represented as a whole network of butterflies for calculating an 8-point FFT, as in Figure 3. The network in Figure 3 consists of sections that combine two N/2-point FFTs. If you read from right to left, the butterflies become smaller, resulting in the following 2-point DFTs:

F(0)=f(0)+f(1) F(1)=f(0)-f(1)

By shuffling the FFT input samples in a structured way, the output data is ordered after performing the FFT. When N is a power of two, this shuffle algorithm is relatively simple. The index of an array sample in binary format can be reversed to obtain the index of the array sample for exchange. When the first half of the array is shuffled (index 0..3), the other half automatically comes in the right place; see Table 1. To find the array index for exchanging elements, a compact and fast piece of code like Example 4 can be written in assembler in which AX holds the array index for exchange with the original element (*OldIndex*). (The file PCFFT.ASM includes code for calculating shuffle indexes and is available electronically; see page 3.)

When the computer executes an FFT, it must calculate the factor *Q ^{k}=cos(k)+i*sin(k)*, with

I've implemented the FFT algorithm described here in both C and assembler. PCFFT.C (Listing One) is the C version, while PCFFT.ASM (available electronically) is written in assembler. Listing Two is the header file for PCFFT.C. Both C and assembler versions use the same function prototype: *void Fft(float *Re, float *Im, int Pwr, int Dir);*, where *Re, Im* are pointers to arrays of (32-bit) floating-point numbers holding the real and imaginary part of the input, respectively; *Pwr *holds the size of the arrays as a power of two (for example, when a 1024-point FFT is to be calculated, *Pwr *should be equal to 10); and *Dir *determines whether an FFT (*Dir* *1*) or an inverse FFT (*Dir* *0*) should be performed.

PCFFTest.C (Listing Three) includes the PCFFT.H header file and shows how the FFT function must be called. You can determine whether to use the C or assembler version by linking the right object file.

As Figure 1 illustrates, the assembler version is the fastest and makes direct use of your system's coprocessor. Set your compiler to generate the fastest code (80x86); when using the assembler version, define the COPROC287 symbol at the assembler command line--if your system has a 80x87 coprocessor--for maximum performance.

The assembler version can also be used to link with Borland Pascal, by defining the PSCL symbol at the assembler command line. When this is done, the correct parameter passing sequence is applied. With Borland Turbo Assembler (TASM 3.0), use the command lines in Example 6 to generate the object file (PCFFT.OBJ) of your choice.

Nieland, H.M. "Fourier Analyse." *Natuur en Techniek* (no. 3, 1990).

Example 1 Performing a DFT on an array of samples results in the discrete density spectrum. Example 2 When the combination method is disregarded, FFT execution time is the result of this equation. Example 3 F(k) can be written as a combination of two N/2-point DFTs, F1(k) and F2(k), holding the even and odd samples of F(k), respectively. Figure 1 FFT execution time on PC using Borland C++ 3.1 and assembler. The assembler code reduces execution time by about another 35 percent. Figure 2 FFT "Butterfly," where the "hub" represents a summation/ subtraction point, while the arrow represents a multiplication.

Old Binary Reversed New Index Format Binary Index Format 0 000 000 0 1 001 100 4 2 010 010 2 3 011 110 6 4 100 001 1 5 101 101 5 6 110 011 3 7 111 111 7

Figure 3 8-point FFT in "Butterfly" notation. Reading the graph from left to right shows how N/2-point FFTs are combined. Reading from right to left shows how an N-point FFT is reduced to the calculation of 2-point DFTs.

MOV BX, OldIndex ; BX = Old index MOV AX, 0000H ; AX = 0 MOV CX, WordLength ; CX = Wordlength in bits CLC ; Clear carry flag NextBit: RCR BX,1 ; Rotate BX through carry right 1 bit RCL AX,1 ; Rotate AX left using carry 1 bit LOOP NextBit ; Repeat till whole word reversed

Example 5 Deriving cos ((k+1)) and sin ((k+1)) from a table of constants.

(a) For XT: TASM pcfft.asm For AT: TASM /dCOPROC287=1 pcfft.asm (b) For XT: TASM /dPSCL=1 pcfft.asm For AT: TASM /dPSCL=1 /dCOPROC287=1 pcfft.asm

/* PCFFT.C -- by J.G.G. Dobbe -- Performs an FFT on two arrays (Re, Im) of type float (can be changed). This unit is written in C and doesn't call assembler routines. */ /* --------------------- Include directive ------------------------ */ #include "pcfft.h" /* --------------------- Local variables -------------------------- */ static float CosArray[28] = { /* cos{-2pi/N} for N = 2, 4, 8, ... 16384 */ -1.00000000000000, 0.00000000000000, 0.70710678118655, 0.92387953251129, 0.98078528040323, 0.99518472667220, 0.99879545620517, 0.99969881869620, 0.99992470183914, 0.99998117528260, 0.99999529380958, 0.99999882345170, 0.99999970586288, 0.99999992646572, /* cos{2pi/N} for N = 2, 4, 8, ... 16384 */ -1.00000000000000, 0.00000000000000, 0.70710678118655, 0.92387953251129, 0.98078528040323, 0.99518472667220, 0.99879545620517, 0.99969881869620, 0.99992470183914, 0.99998117528260, 0.99999529380958, 0.99999882345170, 0.99999970586288, 0.99999992646572 }; static float SinArray[28] = { /* sin{-2pi/N} for N = 2, 4, 8, ... 16384 */ 0.00000000000000, -1.00000000000000, -0.70710678118655, -0.38268343236509, -0.19509032201613, -0.09801714032956, -0.04906767432742, -0.02454122852291, -0.01227153828572, -0.00613588464915, -0.00306795676297, -0.00153398018628, -0.00076699031874, -0.00038349518757, /* sin{2pi/N} for N = 2, 4, 8, ... 16384 */ 0.00000000000000, 1.00000000000000, 0.70710678118655, 0.38268343236509, 0.19509032201613, 0.09801714032956, 0.04906767432742, 0.02454122852291, 0.01227153828572, 0.00613588464915, 0.00306795676297, 0.00153398018628, 0.00076699031874, 0.00038349518757 }; /* --------------------- Function implementations ----------------- */ /* --------------------- ShuffleIndex ----------------------------- */ static unsigned int ShuffleIndex(unsigned int i, int WordLength) /* Function : Finds the shuffle index of array elements. The array length must be a power of two; The power is stored in "WordLength". Return value : With "i" the source array index, "ShuffleIndex" returns the destination index for shuffling. Comment : - */ { unsigned int NewIndex; unsigned char BitNr; NewIndex = 0; for (BitNr = 0; BitNr <= WordLength - 1; BitNr++) { NewIndex = NewIndex << 1; if ((i & 1) != 0) NewIndex = NewIndex + 1; i = i >> 1; } return NewIndex; } /* --------------------- Shuffle2Arr ------------------------------ */ static void Shuffle2Arr(float *a, float *b, int bitlength) /* Function : Shuffles both arrays "a" and "b". This function is called before performing the actual FFT so the array elements are in the right order after FFT. Return value : - Comment : - */ { unsigned int IndexOld, IndexNew; float temp; unsigned int N; int bitlengthtemp; bitlengthtemp = bitlength; /* Save for later use */ N = 1; /* Find array-length */ do { N = N * 2; bitlength = bitlength - 1; } while (bitlength > 0) ; /* Shuffle all elements */ for (IndexOld = 0; IndexOld <= N - 1; IndexOld++) { /* Find index to exchange elements */ IndexNew = ShuffleIndex(IndexOld, bitlengthtemp); if (IndexNew > IndexOld) { /* Exchange elements: */ temp = a[IndexOld]; /* Of array a */ a[IndexOld] = a[IndexNew]; a[IndexNew] = temp; temp = b[IndexOld]; /* Of array a */ b[IndexOld] = b[IndexNew]; b[IndexNew] = temp; } } } /* --------------------- Fft -------------------------------------- */ void Fft(float *Re, float *Im, int Pwr, int Dir) /* Function : Actual FFT algorithm. "Re" and "Im" point to start of real and imaginary arrays of numbers, "Pwr" holds the array sizes as a power of 2 while "Dir" indicates whether an FFT (Dir>=1) or an inverse FFT must be performed (Dir<=0). Return value : The transformed information is returned by "Re" and "Im" (real and imaginary part respectively). Comment : - */ { int pwrhelp; int N; int Section; int AngleCounter; int FlyDistance; int FlyCount; int index1; int index2; float tempr, tempi; float Re1, Re2, Im1, Im2; float c, s; float scale; float sqrtn; float temp; float Qr, Qi; Shuffle2Arr(Re, Im, Pwr); /* Shuffle before (i)FFT */ pwrhelp = Pwr; /* Determine size of arrs */ N = 1; do { N = N * 2; pwrhelp--; } while (pwrhelp > 0) ; if (Dir >= 1) AngleCounter = 0; /* FFT */ else AngleCounter = 14; /* Inverse FFT */ Section = 1; while (Section < N) { FlyDistance = 2 * Section; c = CosArray[AngleCounter]; s = SinArray[AngleCounter]; Qr = 1; Qi = 0; for (FlyCount = 0; FlyCount <= Section - 1; FlyCount++) { index1 = FlyCount; do { index2 = index1 + Section; /* Perform 2-Point DFT */ tempr = 1.0 * Qr * Re[index2] - 1.0 * Qi * Im[index2]; tempi = 1.0 * Qr * Im[index2] + 1.0 * Qi * Re[index2]; Re[index2] = Re[index1] - tempr; /* For Re-part */ Re[index1] = Re[index1] + tempr; Im[index2] = Im[index1] - tempi; /* For Im-part */ Im[index1] = Im[index1] + tempi; index1 = index1 + FlyDistance; } while (index1 <= (N - 1)); /* k */ /* Calculate new Q = cos(ak) + j*sin(ak) = Qr + j*Qi */ /* -2*pi */ /* with: a = ----- */ /* N */ temp = Qr; Qr = Qr*c - Qi*s; Qi = Qi*c + temp*s; } Section = Section * 2; AngleCounter = AngleCounter + 1; } if (Dir <= 0) /* Normalize for */ { /* inverse FFT only */ scale = 1.0/N; for (index1 = 0; index1 <= N - 1; index1++) { Re[index1] = scale * Re[index1]; Im[index1] = scale * Im[index1]; } } } /* ---------------------------------------------------------------- */

/* PCFFT.H -- by J.G.G. Dobbe -- Headers for PCFFT.C */ #ifndef PCFFT_H /* If not defined yet, use this file */ #define PCFFT_H /* --------------------- External function ------------------------ */ void Fft(float *Re, float *Im, int Pwr, int Dir); /* ---------------------------------------------------------------- */ #endif

/* PCFFTest.C -- by J.G.G. Dobbe -- Test program written in Turbo C/Borland C that uses the C or Assembler version of FFT. It depends on the type of FFT object file whether the C- or ASM-version is linked in. In both cases, the same FFT.H file is used. */ /* --------------------- Include directives ----------------------- */ #include <stdio.h> #include <conio.h> #include "pcfft.h" /* --------------------- Constant definition ---------------------- */ #define SIZE 16 /* Size of data arrays (re, im) */ /* --------------------- Variable definitions --------------------- */ float re[SIZE]; /* Array holding Real part of data */ float im[SIZE]; /* Array holding Imaginary part of data */ /* --------------------- Function implementations ----------------- */ /* --------------------- DispArr ---------------------------------- */ void DispArr(char *Txt) { int i; /* Loop counter */ clrscr(); /* Clear screen */ printf("\n%s:\n", Txt); /* Display header */ for (i = 0; i < SIZE; i++) /* Display data points */ printf("i = %4d: Re = %8.2f, Im = %8.2f\n", i, re[i], im[i]); printf("Press <ENTER> to continue\n"); /* Display message */ getch(); /* Wait for keystroke */ } /* --------------------- main ------------------------------------- */ int main() { int i; for (i = 0; i < SIZE; i++) /* Clear data arrays */ { re[i] = 0.0; im[i] = 0.0; } re[0] = 100.0; /* Fill array with pulse signal */ DispArr("Input Time Data"); /* Display time data (pulse signal) */ Fft(re, im, 4, 1); /* FFT on data points */ DispArr("Frequency Response (after FFT)"); /* Display freq data */ Fft(re, im, 4, -1); /* Inverse FFT on freq data */ DispArr("Time Response (after inverse FFT)"); /* Display time data (pulse signal) */ return 0; } /* ---------------------------------------------------------------- */

Copyright © 1995, *Dr. Dobb's Journal*