Purpose/Goals

This book describes data structures, methods of organizing large amounts of data, and algorithm analysis, the estimation of the running time of algorithms. As computers become faster and faster, the need for programs that can handle large amounts of input becomes more acute. Paradoxically, this requires more careful attention to efficiency, since inefficiencies in programs become most obvious when input sizes are large. By analyzing an algorithm before it is actually coded, students can decide if a particular solution will be feasible. For example, in this text students look at specific problems and see how careful implementations can reduce the time constraint for large amounts of data from 16 years to less than a second. Therefore, no algorithm or data structure is presented without an explanation of its running time. In some cases, minute details that affect the running time of the implementation are explored.

Once a solution method is determined, a program must still be written. As computers have become more powerful, the problems they solve have become larger and more complex, thus requiring development of more intricate programs to solve the problems. The goal of this text is to teach students good programming and algorithm analysis skills simultaneously so that they can develop such programs with the maximum amount of efficiency.

This book is suitable for either an advanced data structures (CS7) course or a first-year graduate course in algorithm analysis. Students should have some knowledge of intermediate programming, including such topics as pointers and recursion, and some background in discrete math.

Approach

I believe it is important for students to learn how to program for themselves, not how to copy programs from a book. On the other hand, it is virtually impossible to discuss realistic programming issues without including sample code. For this reason, the book usually provides about half to three-quarters of an implementation, and the student is encouraged to supply the rest.

The algorithms in this book are presented in ANSI C, which, despite some flaws, is arguably the most popular systems programming language. The use of C instead of Pascal allows the use of dynamically allocated arrays (see for instance rehashing in Ch. 5). It also produces simplified code in several places, usually because the and (&&) operation is short-circuited.

Most criticisms of C center on the fact that it is easy to write code that is barely readable. Some of the more standard tricks, such as the simultaneous assignment and testing against 0 via

`if (x=y)`

are generally not used in the text, since the loss of clarity is compensated by only a few keystrokes and no increased speed. I believe that this book demonstrates that unreadable code can be avoided by exercising reasonable care.

Overview

Chapter 1 contains review material on discrete math and recursion. I believe the only way to be comfortable with recursion is to see good uses over and over. Therefore, recursion is prevalent in this text, with examples in every chapter except Chapter 5.

Chapter 2 deals with algorithm analysis. This chapter explains asymptotic analysis and its major weaknesses. Many examples are provided, including an in-depth explanation of logarithmic running time. Simple recursive programs are analyzed by intuitively converting them into iterative programs. More complicated divide-and-conquer programs are introduced, but some of the analysis (solving recurrence relations) is implicitly delayed until Chapter 7, where it is performed in detail.

Chapter 3 covers lists, stacks, and queues. The emphasis here is on coding these data structures using ADTS, fast implementation of these data structures, and an exposition of some of their uses. There are almost no programs (just routines), but the exercises contain plenty of ideas for programming assignments.

Chapter 4 covers trees, with an emphasis on search trees, including external search trees (B-trees). The UNIX file system and expression trees are used as examples. AVL trees and splay trees are introduced but not analyzed. Seventy-five percent of the code is written, leaving similar cases to be completed by the student. Additional coverage of trees, such as file compression and game trees, is deferred until Chapter 10. Data structures for an external medium are considered as the final topic in several chapters.

Chapter 5 is a relatively short chapter concerning hash tables. Some analysis is performed and extendible hashing is covered at the end of the chapter.

Chapter 6 is about priority queues. Binary heaps are covered, and there is additional material on some of the theoretically interesting implementations of priority queues.

Chapter 7 covers sorting. It is very specific with respect to coding details and analysis. All the important general-purpose sorting algorithms are covered and compared. Three algorithms are analyzed in detail: insertion sort, Shellsort, and quicksort. External sorting is covered at the end of the chapter.

Chapter 8 discusses the disjoint set algorithm with proof of the running time. This is a short and specific chapter that can be skipped if Kruskal's algorithm is not discussed.

Chapter 9 covers graph algorithms. Algorithms on graphs are interesting not only because they frequently occur in practice but also because their running time is so heavily dependent on the proper use of data structures. Virtually all of the standard algorithms are presented along with appropriate data structures, pseudocode, and analysis of running time. To place these problems in a proper context, a short discussion on complexity theory (including NP-completeness and undecidability) is provided.

Chapter 10 covers algorithm design by examining common problem-solving techniques. This chapter is heavily fortified with examples. Pseudocode is used in these later chapters so that the student's appreciation of an example algorithm is not obscured by implementation details.

Chapter 11 deals with amortized analysis. Three data structures from Chapters 4 and 6 and the Fibonacci heap, introduced in this chapter, are analyzed.

Chapters 1-9 provide enough material for most one-semester data structures courses. If time permits, then Chapter 10 can be covered. A graduate course on algorithm analysis could cover Chapters 7-11. The advanced data structures analyzed in Chapter 11 can easily be referred to in the earlier chapters. The discussion of NP-completeness in Chapter 9 is far too brief to be used in such a course. Garey and Johnson's book on NP-completeness can be used to augment this text.

Exercises

Exercises, provided at the end of each chapter, match the order in which material is presented. The last exercises may address the chapter as a whole rather than a specific section. Difficult exercises are marked with an asterisk, and more challenging exercises have two asterisks.

A solutions manual containing solutions to almost all the exercises is available separately from The Benjamin/Cummings Publishing Company.

References

References are placed at the end of each chapter. Generally the references either are historical, representing the original source of the material, or they represent extensions and improvements to the results given in the text. Some references represent solutions to exercises.

Acknowledgments

I would like to thank the many people who helped me in the preparation of this and previous versions of the book. The professionals at Benjamin/Cummings made my book a considerably less harrowing experience than I had been led to expect. I'd like to thank my previous editors, Alan Apt and John Thompson, as well as Carter Shanklin, who has edited this version, and Carter's assistant, Vivian McDougal, for answering all my questions and putting up with my delays. Gail Carrigan at Benjamin/Cummings and Melissa G. Madsen and Laura Snyder at Publication Services did a wonderful job with production. The C version was handled by Joe Heathward and his outstanding staff, who were able to meet the production schedule despite the delays caused by Hurricane Andrew.

I would like to thank the reviewers, who provided valuable comments, many of which have been incorporated into the text. Alphabetically, they are Vicki Allan (Utah State University), Henry Bauer (University of Wyoming), Alex Biliris (Boston University), Jan Carroll (University of North Texas), Dan Hirschberg (University of California, Irvine), Julia Hodges (Mississippi State University), Bill Kraynek (Florida International University), Rayno D. Niemi (Rochester Institute of Technology), Robert O. Pettus (University of South Carolina), Robert Probasco (University of Idaho), Charles Williams (Georgia State University), and Chris Wilson (University of Oregon). I would particularly like to thank Vicki Allan, who carefully read every draft and provided very detailed suggestions for improvement.

At FIU, many people helped with this project. Xinwei Cui and John Tso provided me with their class notes. I'd like to thank Bill Kraynek, Wes Mackey, Jai Navlakha, and Wei Sun for using drafts in their courses, and the many students who suffered through the sketchy early drafts. Maria Fiorenza, Eduardo Gonzalez, Ancin Peter, Tim Riley, Jefre Riser, and Magaly Sotolongo reported several errors, and Mike Hall checked through an early draft for programming errors. A special thanks goes to Yuzheng Ding, who compiled and tested every program in the original book, including the conversion of pseudocode to Pascal. I'd be remiss to forget Carlos Ibarra and Steve Luis, who kept the printers and the computer system working and sent out tapes on a minute's notice.

This book is a product of a love for data structures and algorithms that can be obtained only from top educators. I'd like to take the time to thank Bob Hopkins, E. C. Horvath, and Rich Mendez, who taught me at Cooper Union, and Bob Sedgewick, Ken Steiglitz, and Bob Tarjan from Princeton.

Finally, I'd like to thank all my friends who provided encouragement during the project. In particular, I'd like to thank Michele Dorchak, Arvin Park, and Tim Snyder for listening to my stories; Bill Kraynek, Alex Pelin, and Norman Pestaina for being civil next-door (office) neighbors, even when I wasn't; Lynn and Toby Berk for shelter during Andrew, and the HTMC for work relief.

Any mistakes in this book are, of course, my own. I would appreciate reports of any errors you find; my e-mail address is weiss@fiu.edu.

M.A.W.

Miami, Florida

September 1992