*Dr. Dobb's Journal* April 1997

From XModem to TCP, most communications protocols use some form of error-detecting signature. The sender computes a special value and "signs" the packet of data with that value. When the receiver sees the data, it computes its own signature and compares it with the one received from the sender. An error in transferring the data will hopefully cause these two values to disagree. The two most popular error-detecting signatures are checksums and cyclic redundancy checks (CRCs).

Although simple, checksums lack the robust mathematical properties of CRCs. Checksums are the easiest signatures to understand and may well be the most widely used type of signature. A checksum simply adds up the values of each byte of data to obtain a signature value. Unfortunately, checksums aren't particularly effective.

To detect a large number of errors, the signature values must be uniformly distributed -- on random data any signature value should be about as likely as any other. Unfortunately, checksums do not have this property, due to a statistical fact known as the "Law of Large Numbers." (This is the same principle that says seven is more likely than two when you roll a pair of dice.) To get well-distributed signatures, you need to look elsewhere.

There are many different CRCs, all computed in essentially the same fashion. The most important aspect of a CRC is its length. Longer CRCs are more effective than short ones at detecting errors. A 12-bit CRC will miss only one error out of 2^{12}, while a 32-bit CRC will miss only one error in 2^{32}.

CRCs are characterized by a string of bits called the "generating polynomial." Each different generating polynomial produces a different CRC. There are also a number of minor differences in how CRCs are computed. If you want to compute a compatible CRC, you need to understand these differences.

CRCs can be computed nearly as quickly as checksums, using table-driven algorithms (such as those presented here). These implementations are both compact and fast. CRCs are also easy to compute in hardware.

Because CRCs are easily implemented, they are widely used. They are used to detect errors in floppy- and hard-disk interfaces, network software and hardware, program loaders, backup software, revision-control systems, and the like.

The generating polynomial is written either as a sum of powers of *x *or as a string of bits. Table 1 lists three popular generating polynomials. It's not important at this point that you know what a generating polynomial is, just that different generating polynomials give different CRCs.

The relationship between the binary and polynomial forms is just that the powers of *x* indicate which bits in the binary form are set, with the convention that 1 in the polynomial form is thought of as *x*^{0}. So, for the generating polynomial *x*^{16}+*x*^{12}+*x*^{5}+1, bits 16, 12, 5, and 0 are set.

CRCs are calculated on a stream of bits. I'll start with a simple calculator that computes the 16-bit CRC recommended by the CCITT, which creates international communications standards. The calculator in Listing One isn't efficient, but it will help explain those that follow.

CRC calculations consist of alternating two operations: collect the next bit(s), and reduce the value so far. In Listing One, the function is called once for each bit of data. The shift and exclusive-or puts this bit into the *longcrc* variable, and the *if* statement handles the reduction. Listing Two shows a modification (checking the top bit before the shift rather than after) that eliminates the need for a long temporary variable.

If you look at this operation over the eight bits in a byte, the eight reductions don't depend on the bits being shifted into the low-order end -- they depend only on what was already in the high-order byte. Thus, you can rearrange the calculation to collect eight bits, then do eight reductions (or vice versa). By precomputing the reductions in a table, you end up with the very efficient algorithm shown in Listing Three. *InitCrc16* precomputes the reductions for all 256 possible byte values, using exactly the algorithm in the previous section.

Using the precomputed values in the table, *Crc16()* can rapidly update the CRC value. The table itself is small enough to comfortably fit in the L1 cache on most processors. To make *Crc16()* even faster, it could be modified to accept a pointer and a count, removing the need for a function call for each byte of data.

If you compare the previous code with the CCITT-CRC16 computations for YModem and ZModem, you'll notice that the table is identical, but the CRC calculation itself (Listing Four) is slightly rearranged. There's an interesting mathematical reason for this, but here's a relevant nonmathematical observation: In the first version, if you initialize the *crc* value to zero, the first two bytes simply get shifted directly into the CRC (the zero entry in the table is always zero). That is, the *CRC16* of a 2-byte buffer is just those two bytes. Listing Four doesn't have this property. You can get the exact same result from Listing Three by adding two zero bytes to the end of your data and including those in the CRC calculation.

The *CRC16* in Listing Five is another variation. Kermit computes its CRC backwards, shifting the CRC to the right rather than left. This reversed direction means you test the least-significant bit rather than the most-significant bit, and the CRC constant is reversed from 0×1021 to 0×8408.

Other CRCs can be computed similarly. Listing Six shows how to compute a common 32-bit CRC. Notice that the left shift shifts by 8, because it's computing the CRC one byte at a time. The right shift aligns the high-order byte, and therefore is a shift by 24. With minor modifications, this code can be used for any CRC of at least eight bits. Note that ZModem computes its 32-bit CRC backwards from the *Crc32* function here, just as the Kermit CRC16 is backwards from the X/Y/ZModem CRC16.

Common CRC calculators show a number of other minor variations, none of which affect the mathematical effectiveness of the CRC:

- Multibyte CRCs can be sent in any byte order.
- The CRC may be initialized to all 0s or all 1s.
- The final CRC is often complemented before being used.

CRCs are based on a technique for dividing polynomials that's routinely discussed in high-school algebra classes. The key point is to think of the data being transferred as a single long list of bits. By writing this list as a polynomial, a chunk of data becomes a single mathematical entity that can be manipulated as a whole. Put differently, polynomials provide a way of computing that treats any large block of binary data as a single mathematical object.

Grade-school math classes teach a technique for division referred to as "long division" which looks at the dividend (the part under the division symbol) a few digits at a time. High-school math classes teach this same method for dividing polynomials. As you may recall, division of polynomials is even simpler than division of integers, because only the first term matters. For example, the first step in dividing 2*x*^{4}-2*x*^{3}-12*x*^{2}+12*x*-6 by *x*^{2}-3*x*+2 is to divide 2*x*^{4} by *x*^{2}, yielding 2*x*^{2}. The rest of the process (see Figure 1) is the same as dividing integers. At each step, you divide the leading terms to obtain a new term in the quotient, then multiply and subtract.

If you look back at the original bit-at-a-time CRC calculator, you'll see these same steps: Bring another term (bit) into consideration (shift), divide the leading terms (test the high-order bit), multiply (by 0 or 1), and subtract (exclusive OR). Repeating this procedure for each term (bit) eventually leaves a remainder (CRC).

Because you're dividing by a second-degree polynomial in Figure 1, you always consider three terms at a time. After subtracting, you're left with two terms, so you "shift" another term into consideration, just as the CRC algorithm shifts the next bit into the CRC variable. Also, the CRC algorithm only tests a single bit at each loop, much as the division algorithm divides only the leading terms.

The reduction is the part that doesn't make sense at first glance. The subtraction in the normal division algorithm somehow has been mutated into an exclusive-or operation. Clearly, these aren't ordinary polynomials...

The number system of integers modulo 2 has only two numbers -- 0 or 1. This number system allows all four of the basic arithmetic operations according to Figure 2. Just as in normal arithmetic, division by zero is undefined.

Using integers modulo 2 to create polynomials gives you some peculiar properties. For example, since 1+1=0 in modulo 2, you know that (*x*+1)^{2}= *x*^{2}+*x*+*x*+1=*x*^{2}+1, which is very different from what you get using the standard integers.

Now consider the word "dog" in ASCII. As a string of bits, it is 0110 0100 0110 1111 0110 0111. By considering these bits as the coefficients, you have a 22nd degree message polynomial: *x*^{22}+*x*^{21}+*x*^{18}+*x*^{14}+*x*^{13}+*x*^{11}+*x*^{10}+*x*^{9}+*x*^{8}+*x*^{6}+*x*^{5}+*x*^{2}+*x*+1.

In a similar way, even a multimegabyte data file can be viewed as a single large polynomial. Fortunately, because the division algorithm only requires you to look at a few bits at a time, you can compute with these very large polynomials quite easily.

Be forewarned that many people are tempted to equate these polynomials with binary numbers by "substituting 2." That doesn't work. A moment ago you saw that (*x*+1)^{2}=*x*^{2}+1 in modulo 2. If you "substitute 2," this would mean that (2+1)^{2}=32=9 is the same as 22+1=4+1=5. It's more appropriate to think of a term such as *x*^{22} as meaning that there is a one bit in position 22. The polynomial is then just a list of which bit positions hold ones.

Now, I'm ready to define a cyclic redundancy check: A CRC is the remainder obtained by dividing the message polynomial by the generating polynomial. The key to translating this into the code I presented earlier is to realize that adding and subtracting polynomials in modulo 2 is really just the exclusive-or operation, as reflected in the addition and subtraction tables in Figure 2. The initial bit-at-a-time CRC calculator is just the direct realization of the long division algorithm for polynomials modulo 2.

Because CRCs use polynomials modulo 2, there are only two possibilities at each step of the long division algorithm -- either it divides or it doesn't. For example, a 16th degree polynomial divides any other 16th degree polynomial. That's the key to the bit-at-a-time CRC calculator. Checking the high-order bit is simply checking to see if a polynomial is 16th degree. If it is, the exclusive-or does the subtraction.

In symbols, if the message polynomial so far is *m*, and the generating polynomial is *g*, then *m/g* can be written in terms of a quotient *q* and a remainder *r*: *m/g=q+r/g*.

Assume that's already been done. In particular, you know *r*, which is the CRC of the message before this bit. If the next bit is *b*, the new message is *mx+b*. (Multiplying *m* by *x* raises each power by one; that is, it performs a left shift.) The previous remainder then tells you something about the new remainder: *(mx+b)/g=mx/g+b/g=qx+(rx+b)/g*.

This result tells us that the remainder from *(mx+b)/g* is the same as the remainder from *(rx+b)/g*. To find out the new remainder (new CRC), just shift the new bit *b* into the old remainder, and then take the remainder from dividing that by *g*. Because the old remainder was at most 15th degree, the new remainder cannot be more than 16th degree, so the division is just one bit test and a possible exclusive-or. If you look back at the original bit-at-a-time CRC calculator, you'll see that's exactly what it's doing. The nice part about only needing the old remainder to compute the new one is that it's not necessary to look at the entire file to compute the CRC; you only need to keep track of the CRC thus far as you scan through the bits in the file.

Instead of looking at one bit *b* at a time, you could consider the next eight bits at a time, which reveals the table-driven algorithm. Just let *B*=*b*_{0}*x*^{7}+*b*_{1}*x*^{6}+... +*b*_{6}*x*+*b*_{7} (*B* is the next byte of the message). Then, as before, the old remainder *r* becomes the basis for computing the new remainder: *(mx*^{8}*+B)/g=qx*^{8}*+(rx*^{8}*+B)/g*. Since *B* has a smaller degree than *g*, this simplifies to *rx*^{8}*/g+B/g*. The table is just the precomputed remainder from *rx*^{8}*/g*.

To understand why CRCs work, start by examining what happens when there's an error -- when the data you receive is different than what was sent. If you subtract the two different message polynomials (the one that was sent and the one you received), you have a polynomial with a one bit at each error location.

This "error polynomial" is the key. If the CRC doesn't detect the error, then the error polynomial is divisible by the generating polynomial. If the generating polynomial isn't divisible by *x* (the least-significant bit is a one), then the first and last wrong bits must be at least as far apart as the degree of the generating polynomial. This is the often-quoted fact that a CRC will detect any burst error shorter than the generating polynomial.

By selecting the factors of the generating polynomial, you can bias the CRC to detect certain errors. For example, every multiple of *x*+1 has even parity, so the CRC-16 (which has a generating polynomial divisible by *x*+1) will catch every error which affects an odd number of bits. (The error polynomial will have odd parity, and hence can't be a multiple of the generating polynomial.) Conversely, if an error affects an even number of bits, the CRC-16 is that much less likely to detect it, so most generating polynomials cannot be factored.

I originally wrote this description of the CRC after reading many articles that claimed CRCs relied on "properties of prime numbers." Such near-misses are all the more frustrating because the complete story is only slightly more complex, and it gives practical insight into how to compute a wide variety of different CRCs.

For more information on CRCs, see Andy Lowry's algorithm for computing the 16-bit Kermit CRC in the Kermit protocol reference, available via anonymous FTP from kermit.columbia.edu. Likewise, Terry Ritter provided a good description of CRCs in "The Great CRC Mystery" (*DDJ*, February 1986).

DDJ

short BitwiseCrc16(int bit, short crc) { long longcrc = crc; longcrc = (longcrc << 1) ^ bit; /* next bit */ if (longcrc & 0x10000) longcrc ^= 0x11021; /* reduce */ return(longcrc & 0xFFFF); }

short BitwiseCrc16(int bit, short crc) { if (crc & 0x8000) crc = (crc << 1) ^ bit ^ 0x1021 ; else crc = (crc << 1) ^ bit; return (crc & 0xffff); }

static short Crc16Table[256]; void InitCrc16() { short i, j, crc; for(i=0; i < 256; i++) { crc = (i << 8); /* Put i into MSB */ for(j=0; j < 8; j++) /* Do 8 reductions */ crc = (crc << 1) ^((crc & 0x8000)? 0x1021:0); Crc16Table[i] = crc & 0xFFFF; } } short Crc16(int ch, short crc) { crc = Crc16Table[((crc >> 8) & 255)] ^ (crc << 8) ^ ch; return crc & 0xFFFF; }

/* InitCrc16() is same as Listing Three */short XYZModemCrc16(int ch, short crc) { crc = Crc16Table[((crc >> 8) ^ ch) & 255] ^ (crc << 8); return crc & 0xFFFF; }

static short KermitCrc16Table[256];void KermitInitCrc16() { int i, j, crc; for(i=0; i <256; i++) { crc = i; /* Start with i in low-order byte */ for(j=0; j < 8; j++) crc = (crc >> 1) ^ ((crc & 1) ? 0x8408 : 0); KermitCrc16Table[i] = crc & 0xFFFF; } } short KermitCrc16(int ch, short crc) { crc = KermitCrc16Table[(crc ^ ch) & 255)] ^ (crc >> 8); return crc & 0xFFFF; }

static long Crc32Table[256];InitCrc32() { int i, j; long crc; for(i=0; i <256; i++) { crc = (i << 24); /* Put i into MSB */ for(j=0; j < 8; j++) /* 8 reductions */ crc = (crc << 1) ^ ((crc & 0x80000000L) ? 0x04c11db7L : 0); Crc32Table[i] = crc; } } long Crc32(int ch, long crc) { crc = Crc32Table[((crc >> 24) ^ ch) & 255] ^ (crc << 8); return crc & 0xFFFFFFFF; }