# CHAPTER 32: POLYNOMIALS AND THE FFT

The straightforward method of adding two polynomials of degree n takes (n) time, but the straightforward method of multiplying them takes (n2) time. In this chapter, we shall show how the Fast Fourier Transform, or FFT, can reduce the time to multiply polynomials to (nl n).

Polynomials

and

then

where cj = aj + bj for j = 0,1, . . .,n - 1. For example, if A(x) = 6x3 + 7x2 - 10x + 9 and B(x) = -2x3+ 4x - 5, then C(x) = 4x3 + 7x2 - 6x + 4.

`                        6x3 + 7x2 - 10x  +  9`

`                     -  2x3       +  4x  -  5`

`                    -------------------------`

`                     - 30x3 - 35x2 + 50x - 45`

`                24x4 + 28x3 - 40x2 + 36x`

`- 12x6 - 14x5 + 20x4 - 18x3`

`---------------------------------------------`

`- 12x6 - 14x5 + 44x4 - 20x3 - 75x2 +  86x - 45`

Another way to express the product C(x) is

where

#### (32.2)

`Note that degree(C) = degree(A) + degree(B), implying`

`degree-bound(C) = degree-bound(A) + degree-bound(B) - 1`

` degree-bound(A) + degree-bound(B).`

We shall nevertheless speak of the degree-bound of C as being the sum of the degree-bounds of A and B, since if a polynomial has degree-bound k it also has degree-bound k + 1.

Chapter outline

Section 32.1 presents two ways to represent polynomials: the coefficient representation and the point-value representation. The straightforward methods for multiplying polynomials--equations (32.1) and (32.2)--take (n2) time when the polynomials are represented in coefficient form, but only (n) time when they are represented in point-value form. We can, however, multiply polynomials using the coefficient representation in only (n lg n) time by converting between the two representations. To see why this works, we must first study complex roots of unity, which we do in Section 32.2. Then, we use the FFT and its inverse, also described in Section 32.2, to perform the conversions. Section 32.3 shows how to implement the FFT quickly in both serial and parallel models.

This chapter uses complex numbers extensively, and the symbol i will be used exclusively to denote .

# 32.1 Representation of polynomials

The coefficient and point-value representations of polynomials are in a sense equivalent; that is, a polynomial in point-value form has a unique counterpart in coefficient form. In this section, we introduce the two representations and show how they can be combined to allow multiplication of two degree-bound n polynomials in (n lg n) time.

## Coefficient representation

`A(x0) = a0 + x0(a1+x0(a2+ ... +x0(an-2+x0(an-1))...)).`

Similarly, adding two polynomials represented by the coefficient vectors a = (a0, a1, . . ., an-1) and b = (b0, b1, . . ., bn-1) takes (n) time: we just output the coefficient vector c = (c0, c1, . . ., cn-1), where cj = aj + bj for j = 0,1, . . . ,n - 1.

## Point-value representation

`{(x0, y0), (x1, y1), . . ., (xn-1, yn-1)}`

such that all of the xk are distinct and

`yk = A(xk)`

#### (32.3)

for k = 0, 1, . . ., n - 1. A polynomial has many different point-value representations, since any set of n distinct points x0, x1, . . ., xn-1 can be used as a basis for the representation.

Computing a point-value representation for a polynomial given in coefficient form is in principle straightforward, since all we have to do is select n distinct points x0, x1, . . ., xn-1 and then evaluate A(xk) for k = 0, 1, . . ., n - 1. With Horner's method, this n-point evaluation takes time (n2). We shall see later that if we choose the xk cleverly, this computation can be accelerated to run in time (n lg n).

For any set {(x0, y0), (x1, y1), . . ., (xn-1, yn-1)} of n point-value pairs, there is a unique polynomial A(x) of degree-bound n such that yk = A(xk) for k = 0, 1, . . ., n - 1.

Proof The proof is based on the existence of the inverse of a certain matrix. Equation (32.3) is equivalent to the matrix equation

#### (32.4)

The matrix on the left is denoted V(x0, x1, . . ., xn-1) and is known as a Vandermonde matrix. By Exercise 31.1-10, this matrix has determinant

and therefore, by Theorem 31.5, it is invertible (that is, nonsingular) if the xk are distinct. Thus, the coefficients aj can be solved for uniquely given the point-value representation:

`a = V(x0, x1, . . ., xn-1)-1y.      `

The proof of Theorem 32.1 describes an algorithm for interpolation based on solving the set (32.4) of linear equations. Using the LU decomposition algorithms of Chapter 31, we can solve these equations in time O(n3).

#### (32.5)

You may wish to verify that the right-hand side of equation (32.5) is a polynomial of degree-bound n that satisfies A(xk) = yk for all k. Exercise 32.1-4 asks you how to compute the coefficients of A using Lagrange's formula in time (n2).

Thus, n-point evaluation and interpolation are well-defined inverse operations that transform between the coefficient representation of a polynomial and a point-value representation.1 The algorithms described above for these problems take time (n2).

1Interpolation is a notoriously tricky problem from the point of view of numerical stability. Although the approaches described here are mathematically correct, small differences in the inputs or round-off errors during computation can cause large differences in the result.

The point-value representation is quite convenient for many operations on polynomials. For addition, if C(x) = A(x) + B(x), then C(xk) = A(xk) + B(xk) for any point xk. More precisely, if we have a point-value representation for A,

`{(x0, y0), (x1, y1), . . ., (xn-1, yn-1)} ,`

and for B,

`{(x0, y'0), (x1, y'1), . . ., (xn-1, y'n-1)}`

(note that A and B are evaluated at the same n points), then a point-value representation for C is

`{(x0, y0 + y'0), (x1, y1+y'1), . . ., (xn-1, yn-1+y'n-1)}.`

The time to add two polynomials of degree-bound n in point-value form is thus (n).

Similarly, the point-value representation is convenient for multiplying polynomials. If C(x) = A(x)B(x), then C(xk) = A(xk)B(xk) for any point xk, and we can pointwise multiply a point-value representation for A by a point-value representation for B to obtain a point-value representation for C. We must face the problem, however, that the degree-bound of C is the sum of the degree-bounds for A and B. A standard point-value representation for A and B consists of n point-value pairs for each polynomial. Multiplying these together gives us n point-value pairs for C, but since the degree-bound of C is 2n, Theorem 32.1 implies that we need 2n point-value pairs for a point-value representation of C. We must therefore begin with "extended" point-value representations for A and for B consisting of 2n point-value pairs each. Given an extended point-value representation for A,

`{(x0,y0),(x1, y1),...,(x2n-1,y2n-1)},`

and a corresponding extended point-value representation for B,

`{(x0,y'0),(x1,y'1),...,(x2n-1,y'2n-1)},`

then a point-value representation for C is

`{(x0,y0y'0),(x1,y1y'1),...,(x2n-1,y2n-1y'2n-1)}.`

Given two input polynomials in extended point-value form, we see that the time to multiply them to obtain the point-value form of the result is (n), much less than the time required to multiply polynomials in coefficient form.

Finally, we consider how to evaluate a polynomial given in point-value form at a new point. For this problem, there is apparently no approach that is simpler than converting the polynomial to coefficient form first, and then evaluating it at the new point.

## Fast multiplication of polynomials in coefficient form

We can use any points we want as evaluation points, but by choosing the evaluation points carefully, we can convert between representations in only (n lg n) time. As we shall see in Section 32.2, if we choose "complex roots of unity" as the evaluation points, we can produce a point-value representation by taking the Discrete Fourier Transform (or DFT) of a coefficient vector. The inverse operation, interpolation, can be performed by taking the "inverse DFT" of point-value pairs, yielding a coefficient vector. Section 32.2 will show how the FFT performs the DFT and inverse DFT operations in (n lg n) time.

Figure 32.1 shows this strategy graphically. One minor detail concerns degree-bounds. The product of two polynomials of degree-bound n is a polynomial of degree-bound 2n. Before evaluating the input polynomials A and B, therefore, we first double their degree-bounds to 2n by adding n high-order coefficients of 0. Because the vectors have 2n elements, we use "complex (2n)th roots of unity," which are denoted by the w2n terms in Figure 32.1.

Given the FFT, we have the following (n lg n)-time procedure for multiplying two polynomials A(x) and B(x) of degree-bound n, where the input and output representations are in coefficient form. We assume that n is a power of 2; this requirement can always be met by adding high-order zero coefficients.

#### Figure 32.1 A graphical outline of an efficient polynomial-multiplication process. Representations on the top are in coefficient form, while those on the bottom are in point-value form. The arrows from left to right correspond to the multiplication operation. The w2n terms are complex (2n)th roots of unity.

1. Double degree-bound: Create coefficient representations of A(x) and B(x) as degree-bound 2n polynomials by adding n high-order zero coefficients to each.

2. Evaluate: Compute point-value representations of A(x) and B(x) of length 2n through two applications of the FFT of order 2n. These representations contain the values of the two polynomials at the (2n)th roots of unity.

3. Pointwise multiply: Compute a point-value representation for the polynomial C(x) = A(x)B(x) by multiplying these values together pointwise. This representation contains the value of C(x) at each (2n)th root of unity.

4. Interpolate: Create the coefficient representation of the polynomial C(x) through a single application of an FFT on 2n point-value pairs to compute the inverse DFT.

Steps (1) and (3) take time (n), and steps (2) and (4) take time (n lg n). Thus, once we show how to use the FFT, we will have proven the following.

The product of two polynomials of degree-bound n can be computed in time (n lg n), with both the input and output representations in coefficient form.

## Exercises

Multiply the polynomials A(x) = 7x3 - x2 + x - 10 and B(x) = 8x3 - 6x + 3 using equations (32.1) and (32.2).

`A(x) = q(x)(x - x0) + r .`

Clearly, A(x0) = r. Show how to compute the remainder r and the coefficients of q(x) in time (n) from x0 and the coefficients of A.

Derive a point-value representation for from a point-value representation for , assuming that none of the points is 0.

Show how to use equation (32.5) to interpolate in time (n2). (Hint: First compute j(x - xk) and j(xj - xk) and then divide by (x - xk) and (xj - xk) as necessary for each term. See Exercise 32.1-2.)

Explain what is wrong with the "obvious" approach to polynomial division using a point-value representation. Discuss separately the case in which the division comes out exactly and the case in which it doesn't.

`C = {x + y: x  A and y  B}.`

Note that the integers in C are in the range from 0 to 20n. We want to find the elements of C and the number of times each element of C is realized as a sum of elements in A and B. Show that the problem can be solved in O(n lg n) time. (Hint: Represent A and B as polynomials of degree 10n.)

# 32.2 The DFT and FFT

In Section 32.1, we claimed that if we use complex roots of unity, we can evaluate and interpolate in (n lg n) time. In this section, we define complex roots of unity and study their properties, define the DFT, and then show how the FFT computes the DFT and its inverse in just (n lg n) time.

## Complex roots of unity

`wn = 1 .`

There are exactly n complex nth roots of unity; these are e2ik/n for k = 0, 1, . . . , n - 1. To interpret this formula, we use the definition of the exponential of a complex number:

`eiu = cos(u) + i sin(u).`

`wn = e2i/n`

#### (32.6)

is called the principal nth root of unity; all of the other complex nth roots of unity are powers of wn.

The n complex nth roots of unity,

form a group under multiplication (see Section 33.3). This group has the same structure as the additive group (Zn, +) modulo n, since implies that . Similarly, . Essential properties of the complex nth roots of unity are given in the following lemmas.

#### (32.7)

Proof The lemma follows directly from equation (32.6), since

For any even integer n > 0,

Proof The proof is left as Exercise 32.2-1.

If n > 0 is even, then the squares of the n complex nth roots of unity are the n/2 complex (n/2)th roots of unity.

Proof By the cancellation lemma, we have , for any non-negative integer k. Note that if we square all of the complex nth roots of unity, then each (n/2)th root of unity is obtained exactly twice, since

Thus, have the same square. This property can also be proved using Corollary 32.4, since implies , and thus .

As we shall see, the halving lemma is essential to our divide-and-conquer approach for converting between coefficient and point-value representations of polynomials, since it guarantees that the recursive subproblems are only half as large.

Proof Because equation (3.3) applies to complex values,

Requiring that k not be divisible by n ensures that the denominator is not 0, since only when k is divisible by n.

## The DFT

Recall that we wish to evaluate a polynomial

of degree-bound n at (that is, at the n complex nth roots of unity).2 Without loss of generality, we assume that n is a power of 2, since a given degree-bound can always be raised--we can always add new high-order zero coefficients as necessary. We assume that A is given in coefficient form: a = (a0, a1, . . . , an-1). Let us define the results yk, for k = 0, 1, . . . , n - 1, by

#### (32.8)

2The length n is actually what we referred to as 2n in Section 32.1, since we double the degree-bound of the given polynomials prior to evaluation. In the context of polynomial multiplication, therefore, we are actually working with complex (2n)th roots of unity.+

## The FFT

The FFT method employs a divide-and-conquer strategy, using the even-index and odd-index coefficients of A(x) separately to define the two new degree-bound n/2 polynomials A[0](x) and A[1](x):

`A[0](x)  =  a0 + a2x + a4x2 +    + an-2xn/2-1,`

`A[1](x)  =  a1 + a3x + a5x2 +    + an-1xn/2-1.`

Note that A[0] contains all the even-index coefficients of A (the binary representation of the index ends in 0) and A[1] contains all the odd-index coefficients (the binary representation of the index ends in 1). It follows that

`A(x) = A[0](x2) + xA[1](x2),`

#### (32.9)

so that the problem of evaluating A(x) at reduces to

1. evaluating the degree-bound n/2 polynomials A[0](x) and A[1](x) at the points

#### (32.10)

and then

2. combining the results according to equation (32.9).

The RECURSIVE-FFT procedure works as follows. Lines 2-3 represent the basis of the recursion; the DFT of one element is the element itself, since in this case

Lines 6-7 define the coefficient vectors for the polynomials A[0] and A[1]. Lines 4, 5, and 13 guarantee that w is updated properly so that whenever lines 11-12 are executed, . (Keeping a running value of w from iteration to iteration saves time over computing from scratch each time through the for loop.) Lines 8-9 perform the recursive DFTn/2 computations, setting, for k = 0, 1, . . . , n/2 - 1,

or, since by the cancellation lemma,

Lines 11-12 combine the results of the recursive DFTn/2 calculations. For y0, y1, . . . , yn/2 - 1, line 11 yields

where the last line of this argument follows from equation (32.9). For yn/2,yn/2+1, . . . , yn - 1, letting k = 0, 1, . . . , n/2 - 1, line 12 yields

The second line follows from the first since . The fourth line follows from the third because implies . The last line follows from equation (32.9). Thus, the vector y returned by RECURSIVE-FFT is indeed the DFT of the input vector a.

To determine the running time of procedure RECURSIVE-FFT, we note that exclusive of the recursive calls, each invocation takes time (n), where n is the length of the input vector. The recurrence for the running time is therefore

`T(n) = 2T(n/2) + (n)`

`= (n lg n) .`

Thus, we can evaluate a polynomial of degree-bound n at the complex nth roots of unity in time (n lg n) using the Fast Fourier Transform.

## Interpolation at the complex roots of unity

From equation (32.4), we can write the DFT as the matrix product y = Vna, where Vn is a Vandermonde matrix containing the appropriate powers of wn:

The (k, j) entry of , for j,k = 0, 1, . . . , n - 1, and the exponents of the entries of Vn form a multiplication table.

For the inverse operation, which we write as , we proceed by multiplying y by the matrix , the inverse of Vn.

For j, k = 0, 1, . . . , n - 1, the (j, k) entry of .

Proof We show that , the n X n identity matrix. Consider the (j, j') entry of :

This summation equals 1 if j' = j, and it is 0 otherwise by the summation lemma (Lemma 32.6). Note that we rely on -(n - 1) < j' - j < n - l, so that j' - j is not divisible by n, in order for the summation lemma to apply.

Given the inverse matrix , we have that is given by

#### (32.11)

for j = 0,1, . . . , n - 1. By comparing equations (32.8) and (32.11), we see that by modifying the FFT algorithm to switch the roles of a and y, replace wn by , and divide each element of the result by n, we compute the inverse DFT (see Exercise 32.2-4). Thus, can be computed in (n lg n) time as well.

Thus, by using the FFT and the inverse FFT, we can transform a polynomial of degree-bound n back and forth between its coefficient representation and a point-value representation in time (n lg n). In the context of polynomial multiplication, we have shown the following.

where the vectors a and b are padded with 0's to length 2n and denotes the componentwise product of two 2n-element vectors.

## Exercises

Prove Corollary 32.4.

Compute the DFT of the vector (0, 1, 2, 3).

Do Exercise 32.1-1 by using the (n lg n)-time scheme.

Write pseudocode to compute in (n lg n) time.

Describe the generalization of the FFT procedure to the case in which n is a power of 3. Give a recurrence for the running time, and solve the recurrence.

Suppose that instead of performing an n-element FFT over the field of complex numbers (where n is even), we use the ring Zm of integers modulo m, where m = 2tn/2 + 1 and t is an arbitrary positive integer. Use w = 2t instead of wn as a principal nth root of unity, modulo m. Prove that the DFT and the inverse DFT are well defined in this system.

Given a list of values z0, z1, . . . , zn-1 (possibly with repetitions), show how to find the coefficients of the polynomial P(x) of degree-bound n that has zeros only at z0, z1, . . . , zn - 1 (possibly with repetitions). Your procedure should run in time O(n lg2 n). (Hint: The polynomial P(x) has a zero at zj if and only if P(x) is a multiple of (x - zj).)

to view the chirp transform as a convolution.)

# 32.3 Efficient FFT implementations

Since the practical applications of the DFT, such as signal processing, demand the utmost speed, this section examines two efficient FFT implementations. First, we shall examine an iterative version of the FFT algorithm that runs in (n lg n) time but has a lower constant hidden in the -notation than the recursive implementation in Section 32.2. Then, we shall use the insights that led us to the iterative implementation to design an efficient parallel FFT circuit.

## An iterative FFT implementation

We now show how to make the FFT algorithm iterative rather than recursive in structure. In Figure 32.4, we have arranged the input vectors to the recursive calls in an invocation of RECURSIVE-FFT in a tree structure, where the initial call is for n = 8. The tree has one node for each call of the procedure, labeled by the corresponding input vector. Each RECURSIVE-FFT invocation makes two recursive calls, unless it has received a 1-element vector. We make the first call the left child and the second call the right child.

Looking at the tree, we observe that if we could arrange the elements of the initial vector a into the order in which they appear in the leaves, we could mimic the execution of the RECURSIVE-FFT procedure as follows. First, we take the elements in pairs, compute the DFT of each pair using one butterfly operation, and replace the pair with its DFT. The vector then holds n/2 2-element DFT's. Next, we take these n/2 DFT's in pairs and compute the DFT of the four vector elements they come from by executing two butterfly operations, replacing two 2-element DFT's with one 4-element DFT. The vector then holds n/4 4-element DFT's. We continue in this manner until the vector holds two (n/2)-element DFT's, which we can combine using n/2 butterfly operations into the final n-element DFT.

To turn this observation into code, we use an array A[0 . . n - 1] that initially holds the elements of the input vector a in the order in which they appear in the leaves of the tree of Figure 32.4. (We shall show later how to determine this order.) Because the combining has to be done on each level of the tree, we introduce a variable s to count the levels, ranging from 1 (at the bottom, when we are combining pairs to form 2-element DFT's) to lg n (at the top, when we are combining two (n/2)-element DFT's to produce the final result). The algorithm therefore has the following structure:

`1  for s  1 to lg n`

`2       do for k  0 to n - 1 by 2s`

`3              do combine the two 2s-1 -element DFT's in`

`A[k . . k + 2s-1 - 1] and A[k + 2s-1 . . k + 2s - 1]`

`into one 2s-element DFT in A[k . . k + 2s - 1]`

We can express the body of the loop (line 3) as more precise pseudocode. We copy the for loop from the RECURSIVE-FFT procedure, identifying y[0] with A[k . . k + 2s-1 _ 1] and y[1] with A[k + 2s-1. . k + 2s _ 1]. The value of w used in each butterfly operation depends on the value of s; we use wm, where m = 2s. (We introduce the variable m solely for the sake of readability.) We introduce another temporary variable u that allows us to perform the butterfly operation in place. When we replace line 3 of the overall structure by the loop body, we get the following pseudocode, which forms the basis of our final iterative FFT algorithm as well as the parallel implementation we shall present later.

`FFT-BASE(a)`

`1  n  length[a]   n is a power of 2.`

`2  for s  1 to lg n`

`3       do m  2s`

`4          wm  e2i/m`

`5          for k  0 to n - 1 by m`

`6              do w  1`

`7                 for j  0 to m/2 - 1`

`8                     do t  w A[k + j + m/2]`

`9                        u  A[ k + j]`

`10                        A[k + j]  u + t`

`11                        A[k + j + m /2]  u - t`

`12                        w  wwm`

We now present the final version of our iterative FFT code, which inverts the two inner loops to eliminate some index computation and uses the auxiliary procedure BIT-REVERSE-COPY(a, A) to copy vector a into array A in the initial order in which we need the values.

`ITERATIVE-FFT(a)`

`1  BIT-REVERSE-COPY (a, A)`

`2  n  length[a]       n is a power of 2.`

`3  for s  1 to lg n`

`4        do m  2s`

`5           wm  e2i/m`

`6           w  1`

`7           for j  0 to m/2 - 1`

`8               do for k  j to n - 1 by m`

`9                      do t  w A[k + m/2]`

`10                         u  A[k]`

`11                         A[k]  u + t`

`12                         A[k + m/2]  u - t`

`13                  w  wwm`

`14  return A`

How does BIT-REVERSE-COPY get the elements of the input vector a into the desired order in the array A? The order in which the leaves appear in Figure 32.4 is "bit-reverse binary." That is, if we let rev(k) be the lg n-bit integer formed by reversing the bits of the binary representation of k, then we want to place vector element ak in array position A[rev(k)]. In Figure 32.4, for example, the leaves appear in the order 0, 4, 2, 6, 1, 5, 3, 7; this sequence in binary is 000, 100, 010, 110, 001, 101, 011, 111, and in bit- reverse binary we get the sequence 000, 001, 010, 011, 100, 101, 110, 111. To see that we want bit-reverse binary order in general, we note that at the top level of the tree, indices whose low-order bit is 0 are placed in the left subtree and indices whose low-order bit is 1 are placed in the right subtree. Stripping off the low-order bit at each level, we continue this process down the tree, until we get the bit-reverse binary order at the leaves.

Since the function rev (k) is easily computed, the BIT-REVERSE-COPY procedure can be written as follows.

`BIT-REVERSE-COPY(a, A)`

`1  n  length[a]`

`2  for k  0 to n - 1`

`3       do A[rev(k)]  ak`

The iterative FFT implementation runs in time (n lg n). The call to BIT-REVERSE-COPY(a,A) certainly runs in O(n lg n) time, since we iterate n times and can reverse an integer between 0 and n - 1, with lg n bits, in O(lg n) time. (In practice, we usually know the initial value of n in advance, so we would probably code a table mapping k to rev(k), making BIT-REVERSE-COPY run in (n) time with a low hidden constant. Alternatively, we could use the clever amortized reverse binary counter scheme described in Problem 18-1.) To complete the proof that ITERATIVE-FFT runs in time (n lg n), we show that L(n), the number of times the body of the innermost loop (lines 9-12) is executed, is (n lg n). We have

## A parallel FFT circuit

The leftmost part of the circuit PARALLEL-FFT performs the bit-reverse permutation, and the remainder mimics the iterative FFT-BASE procedure. We take advantage of the fact that each iteration of the outermost for loop performs n/2 independent butterfly operations that can be performed in parallel. The value of s in each iteration within FFT-BASE corresponds to a stage of butterflies shown in Figure 32.5. Within stage s, for s = 1, 2, . . . , lg n, there are n/2s groups of butterflies (corresponding to each value of k in FFT-BASE), with 2s-1 butterflies per group (corresponding to each value of j in FFT-BASE). The butterflies shown in Figure 32.5 correspond to the butterfly operations of the innermost loop (lines 8-11 of FFT-BASE). Note also that the values of w used in the butterflies correspond to those used in FFT-BASE: in stage s, we use , where m = 2s.

## Exercises

Show how ITERATIVE-FFT computes the DFT of the input vector (0, 2, 3, -1, 4, 5, 7, 9).

Show how to implement an FFT algorithm with the bit-reversal permutation occurring at the end, rather than at the beginning, of the computation. (Hint: Consider the inverse DFT.)

To compute DFTn, how many addition, subtraction, and multiplication elements, and how many wires, are needed in the PARALLEL-FFT circuit described in this section? (Assume that only one wire is needed to carry a number from one place to another.)

Suppose that the adders in the FFT circuit sometimes fail in such a manner that they always produce a zero output, independent of their inputs. Suppose that exactly one adder has failed, but that you don't know which one. Describe how you can identify the failed adder by supplying inputs to the overall FFT circuit and observing the outputs. Try to make your proeedure efficient.

# Problems

b. Give two divide-and-conquer algorithms for multiplying two polynomials of degree-bound n that run in time (n1g 3). The first algorithm should divide the input polynomial coefficients into a high half and a low half, and the second algorithm should divide them according to whether their index is odd or even.

c. Show that two n-bit integers can be multiplied in O(n1g 3) steps, where each step operates on at most a constant number of 1-bit values.

a. Is the sum of two Toeplitz matrices necessarily Toeplitz? What about the product?

b. Describe how to represent a Toeplitz matrix so that two n X n Toeplitz matrices ean be added in O(n) time.

c. Give an O(n lg n)-time algorithm for multiplying an n X n Toeplitz matrix by a vector of length n. Use your representation from part (b).

d. Give an efficient algorithm for multiplying two n X n Toeplitz matrices. Analyze its running time.

From the coefficient representation (a0, a1, . . . , an-1) of A(x) and a given point x0, we wish to determine A(t) (x0) for t = 0, 1, . . . , n - 1.

a. Given coefficients b0, b1, . . . , bn-1 such that

show how to compute A(t) (x0), for t = 0, 1, . . . , n - 1, in O(n) time.

b. Explain how to find b0, b1, . . . , bn-1 in O(n lg n) time, given for k = 0, 1, . . . , n - 1.

c. Prove that

where f(j) = aj j! and

d. Explain how to evaluate for k = 0, 1, . . . , n - 1 in O(n lg n) time. Conclude that all nontrivial derivatives of A(x) can be evaluated at x0 in O(n lg n) time.

To do so, we shall use the fact that we can compute the polynomial remainder when one such polynomial is divided by another in O(n lg n) time, a result that we assume without proof. For example, the remainder of 3x3 + x2 - 3x + 1 when divided by x2 + x + 2 is

`(3x3 + x2 - 3x + 1) mod (x2 + x + 2) = 5x - 3.`

Given the coefficient representation of a polynomial and n points x0, x1, . . . , xn-1, we wish to compute the n values A(x0), A(x1), . . . , A(xn-1). For 0 i j n - 1, define the polynomials and Qij(x) = A(x) mod pij(x). Note that Qij(x) has degree-bound at most j - i.

a. Prove that A(x) mod (x - z) = A(z) for any point z.

b. Prove that Qkk(x) = A(xk) and that Q0,n-1(x) = A(x).

c. Prove that for i k j, we have Qik(x) = Qij(x) mod Pik(x) and Qkj(x) = Qij(x) mod Pkj (x).

d. Give an O(n lg2 n)-time algorithm to evaluate A(x0), A(x1), . . . , A(xn-1).

a. Suppose that we search for the smallest k such that p = kn + 1 is prime. Give a simple heuristic argument why we might expect k to be approximately lg n. (The value of k might be much larger or smaller, but we can reasonably expect to examine O(lg n) candidate values of k on average.) How does the expected length of p compare to the length of n?

Let g be a generator of , and let w = gk mod p.

b. Argue that the DFT and the inverse DFT are well-defined inverse operations modulo p, where w is used as a principal nth root of unity.

c. Argue that the FFT and its inverse can be made to work modulo p in time O(n lg n), where operations on words of O(lg n) bits take unit time. Assume that the algorithm is given p and w.

d. Compute the DFT modulo p = 17 of the vector (0, 5, 3, 7, 7, 2, 1, 6). Note that g = 3 is a generator of .

# Chapter notes

Press, Flannery, Teukolsky, and Vetterling[161, 162] have a good description of the Fast Fourier Transform and its applications. For an excellent introduction to signal processing, a popular FFT application area, see the text by Oppenheim and Willsky [153].

Cooley and Tukey [51] are widely credited with devising the FFT in the 1960's. The FFT had in fact been discovered many times previously, but its importance was not fully realized before the advent of modern digital computers. Press, Flannery, Teukolsky, and Vetterling attribute the origins of the method to Runge and König ( 1924).