# CHAPTER 26: ALL-PAIRS SHORTEST PATHS

We can solve an all-pairs shortest-paths problem by running a single-source shortest-paths algorithm V times, once for each vertex as the source. If all edge weights are nonnegative, we can use Dijkstra's algorithm. If we use the linear-array implementation of the priority queue, the running time is O(V3 + VE) = O(V3). The binary-heap implementation of the priority queue yields a running time of O(V E lg V), which is an improvement if the graph is sparse. Alternatively, we can implement the priority queue with a Fibonacci heap, yielding a running time of O(V2 lg V + VE).

If negative-weight edges are allowed, Dijkstra's algorithm can no longer be used. Instead, we must run the slower Bellman-Ford algorithm once from each vertex. The resulting running time is O(V2 E), which on a dense graph is O(V4). In this chapter we shall see how to do better. We shall also investigate the relation of the all-pairs shortest-paths problem to matrix multiplication and study its algebraic structure.

Unlike the single-source algorithms, which assume an adjacency-list representation of the graph, most of the algorithms in this chapter use an adjacency-matrix representation. (Johnson's algorithm for sparse graphs uses adjacency lists.) The input is an n n matrix W representing the edge weights of an n-vertex directed graph G = (V, E). That is, W = (wij), where

#### (26.1)

Negative-weight edges are allowed, but we assume for the time being that the input graph contains no negative-weight cycles.

The tabular output of the all-pairs shortest-paths algorithms presented in this chapter is an n n matrix D = (dij), where entry dij contains the weight of a shortest path from vertex i to vertex j. That is, if we let (i,j) denote the shortest-path weight from vertex i to vertex j (as in Chapter 25), then dij = (i,j) at termination.

`V,i = {j  V : ij  NIL}  {i}`

and

`E,i = {(ij, j): j  V,i and ij  NIL} .`

If G,i is a shortest-paths tree, then the following procedure, which is a modified version of the PRINT-PATH procedure from Chapter 23, prints a shortest path from vertex i to vertex j.

`PRINT-ALL-PAIRS-SHORTEST-PATH(,i,j)`

`1  if i = j`

`2      then print i`

`3      else if ij = NIL`

`4              then print "no path from" i "to" j "exists"`

`5              else PRINT-ALL-PAIRS-SHORTEST-PATH(,i,ij)`

`6                   print j`

In order to highlight the essential features of the all-pairs algorithms in this chapter, we won't cover the creation and properties of predecessor matrices as extensively as we dealt with predecessor subgraphs in Chapter 25. The basics are covered by some of the exercises.

## Chapter outline

Section 26.1 presents a dynamic-programming algorithm based on matrix multiplication to solve the all-pairs shortest-paths problem. Using the technique of "repeated squaring," this algorithm can be made to run in (V3 1g V) time. Another dynamic-programming algorithm, the Floyd-Warshall algorithm, is given in Section 26.2. The Floyd-Warshall algorithm runs in time (V3). Section 26.2 also covers the problem of finding the transitive closure of a directed graph, which is related to the all-pairs shortest-paths problem. Johnson's algorithm is presented in Section 26.3. Unlike the other algorithms in this chapter, Johnson's algorithm uses the adjacency-list representation of a graph. It solves the all-pairs shortest-paths problem in O(V2 lg V + V E) time, which makes it a good algorithm for large, sparse graphs. Finally, in Section 26.4, we examine an algebraic structure called a "closed semiring," which allows many shortest-paths algorithms to be applied to a host of other all-pairs problems not involving shortest paths.

Before proceeding, we need to establish some conventions for adjacency-matrix representations. First, we shall generally assume that the input graph G = (V, E) has n vertices, so that n = V. Second, we shall use the convention of denoting matrices by uppercase letters, such as W or D, and their individual elements by subscripted lowercase letters, such as wij or dij. Some matrices will have parenthesized superscripts, as in D(m) = , to indicate iterates. Finally, for a given n n matrix A, we shall assume that the value of n is stored in the attribute rows[A].

# 26.1 Shortest paths and matrix multiplication

Before proceeding, let us briefly recap the steps given in Chapter 16 for developing a dynamic-programming algorithm.

1. Characterize the structure of an optimal solution.

2. Recursively define the value of an optimal solution.

3. Compute the value of an optimal solution in a bottom-up fashion.

(The fourth step, constructing an optimal solution from computed information, is dealt with in the exercises.)

## A recursive solution to the all-pairs shortest-paths problem

Now, let be the minimum weight of any path from vertex i to vertex j that contains at most m edges. When m = 0, there is a shortest path from i to j with no edges if and only if i = j. Thus,

For m 1, we compute as the minimum of (the weight of the shortest path from i to j consisting of at most m - 1 edges) and the minimum weight of any path from i to j consisting of at most m edges, obtained by looking at all possible predecessors k of j. Thus, we recursively define

#### (26.2)

The latter equality follows since wjj = 0 for all j.

What are the actual shortest-path weights (i, j)? If the graph contains no negative-weight cycles, then all shortest paths are simple and thus contain at most n - 1 edges. A path from vertex i to vertex j with more than n - 1 edges cannot have less weight than a shortest path from i to j. The actual shortest-path weights are therefore given by

## Computing the shortest-path weights bottom up

Taking as our in put the matrix W = (wij), we now compute a series of matrices D(1), D(2), . . . , D(n-1), where for m = 1, 2, . . . n - 1,, we have . The final matrix D(n-1) contains the actual shortest-path weights. Observe that since for all vertices i, j V, we have D(1) = W.

The heart of the algorithm is the following procedure, which, given matrices D(m-1) and W, returns the matrix D(m). That is, it extends the shortest paths computed so far by one more edge.

`EXTEND-SHORTEST-PATHS(D,W)`

`1 n  rows[D]`

`2 let D' = (d'ij) be an n  n matrix`

`3 for i  1 to n`

`4      do for j  1 to n`

`5             do d'ij  `

`6                for k  1 to n`

`7                    do d'ij  min(d'ij, dik + wkj)`

`8 return D'`

The procedure computes a matrix D' = (d'ij), which it returns at the end. It does so by computing equation (26.2) for all i and j, using D for D(m-1)and D' for D(m). (It is written without the superscripts to make its input and output matrices independent of m.) Its running time is (n3) due to the three nested for loops.

We can now see the relation to matrix multiplication. Suppose we wish to compute the matrix product C = A B of two n n matrices A and B. Then, for i, j = 1, 2, . . . , n, we compute

#### (26.4)

Observe that if we make the substitutions

`d(m-1)    a ,`

`w    b ,`

`d(m)   c ,`

`min    + ,`

`+    `

in equation (26.2), we obtain equation (26.4). Thus, if we make these changes to EXTEND-SHORTEST-PATHS and also replace (the identity for min) by 0 (the identity for +), we obtain the straightforward (n3)-time procedure for matrix multiplication.

`MATRIX-MULTIPLY(A, B)`

`1 n  rows[A]`

`2 let C be an n  n matrix`

`3 for i  1 to n`

`4      do for j  1 to n`

`5             do cij  0`

`6                for k  1 to n`

`7                    do cij  cij + aik  bkj`

`8 return C`

Returning to the all-pairs shortest-paths problem, we compute the shortest-path weights by extending shortest paths edge by edge. Letting A B denote the matrix "product" returned by EXTEND-SHORTEST-PATHS(A, B), we compute the sequence of n - 1 matrices

`D(1)  =   D(0)  W  =  W,`

`D(2)  =   D(1)  W  =  W2,`

`D(3)  =   D(2)  W  =  W3,`

`D(n-1)  =  D(n-2)  W  =  Wn-1 .`

As we argued above, the matrix D(n-1) = Wn-1 contains the shortest-path weights. The following procedure computes this sequence in (n4) time.

`SLOW-ALL-PAIRS-SHORTEST-PATHS(W)`

`1  n  rows[W]`

`2  D(1)  W`

`3  for m  2 to n - 1`

`4       do D(m)  EXTEND-SHORTEST-PATHS(D(m-1),W)`

`5  return D(n-1)`

Figure 26.1 shows a graph and the matrices D(m) computed by the procedure SLOW-ALL-PAIRS-SHORTEST-PATHS.

## Improving the running time

Our goal, however, is not to compute all the D(m) matrices: we are interested only in matrix D(n-1). Recall that in the absence of negative-weight cycles, equation (26.3) implies D(m) = D(n-1) for all integers m n - 1. We can compute D(n-1) with only lg(n - 1) matrix products by computing the sequence

`D(1)  =   W ,`

`D(2)  =   W2   =  W W,`

`D(4)  =   W4   =  W2 W2`

`D(8)  =   W8   =  W4 W4,`

`D(2lg(n-1))  = W2lg(n-1)   =  W2lg(n-1)-1  W2lg(n-1)-1.`

Since 2lg(n-1) n - 1, the final product D(21g(n-1)) is equal to D(n-1).

#### Figure 26.1 A directed graph and the sequence of matrices D(m) computed by SLOW-ALL-PAIRS-SHORTEST-PATHS. The reader may verify that D(5) = D(4) . W is equal to D(4), and thus D(m) = D(4) for all m 4.

`FASTER-ALL-PAIRS-SHORTEST-PATHS(W)`

`1  n  rows[W]`

`2  D(1)  W`

`3  m  1`

`4  while n - 1 > m`

`5      do D(2m)  EXTEND-SHORTEST-PATHS(D(m),D(m))`

`6         m  2m`

`7  return D(m)`

In each iteration of the while loop of lines 4-6, we compute D(2m) = (D(m))2, starting with m = 1. At the end of each iteration, we double the value of m. The final iteration computes D(n-1) by actually computing D(2m) for some n - 1 2m < 2n - 2. By equation (26.3), D(2m) = D(n-1). The next time the test in line 4 is performed, m has been doubled, so now n - 1 m, the test fails, and the procedure returns the last matrix it computed.

The running time of FASTER-ALL-PAIRS-SHORTEST-PATHS is (n3lg n) since each of the lg(n - 1) matrix products takes (n3) time. Observe that the code is tight, containing no elaborate data structures, and the constant hidden in the -notation is therefore small.

## Exercises

Run SLOW-ALL-PAIRS-SHORTEST-PATHS on the weighted, directed graph of Figure 26.2, showing the matrices that result for each iteration of the respective loops. Then do the same for FASTER-ALL-PAIRS-SHORTEST-PATHS.

Why do we require that wii = 0 for all 1 i n?

What does the matrix

used in the shortest-paths algorithms correspond to in regular matrix multiplication?

Show how to express the single-source shortest-paths problem as a product of matrices and a vector. Describe how evaluating this product corresponds to a Bellman-Ford-like algorithm (see Section 25.3).

Suppose we also wish to compute the vertices on shortest paths in the algorithms of this section. Show how to compute the predecessor matrix from the completed matrix D of shortest-path weights in O(n3) time.

The vertices on shortest paths can also be computed at the same time as the shortest-path weights. Let us define to be the predecessor of vertex j on any minimum-weight path from i to j that contains at most m edges. Modify EXTEND-SHORTEST-PATHS and SLOW-ALL-PAIRS-SHORTEST-PATHS to compute the matrices (1), (2), . . . , (n-1) as the matrices D(1), D(2), . . . , D(n-1) are computed.

Give an efficient algorithm to find the length (number of edges) of a minimum-length negative-weight cycle in a graph.

# 26.2 The Floyd-Warshall algorithm

## The structure of a shortest path

The Floyd-Warshall algorithm is based on the following observation. Let the vertices of G be V = {1, 2, . . . , n}, and consider a subset {1, 2, . . . , k} of vertices for some k. For any pair of vertices i, j V, consider all paths from i to j whose intermediate vertices are all drawn from {1, 2, . . . , k}, and let p be a minimum-weight path from among them. (Path p is simple, since we assume that G contains no negative-weight cycles.) The Floyd- Warshall algorithm exploits a relationship between path p and shortest paths from i to j with all intermediate vertices in the set {1, 2, . . . , k - 1}. The relationship depends on whether or not k is an intermediate vertex of path p.

#### Figure 26.3 Path p is a shortest path from vertex i to vertex j, and k is the highest-numbered intermediate vertex of p. Path p1, the portion of path p from vertex i to vertex k, has all intermediate vertices in the set {1, 2, . . . , k - 1}. The same holds for path p2 from vertex k to vertex j.

If k is not an intermediate vertex of path p, then all intermediate vertices of path p are in the set {1, 2, . . . , k - 1}. Thus, a shortest path from vertex i to vertex j with all intermediate vertices in the set {1, 2, . . . , k - 1} is also a shortest path from i to j with all intermediate vertices in the set {1, 2, . . . , k}.

If k is an intermediate vertex of path p, then we break p down into as shown in Figure 26.3. By Lemma 25.1, p1 is a shortest path from i to k with all intermediate vertices in the set {1, 2, . . . , k}. In fact, vertex k is not an intermediate vertex of path p1, and so p1 is a shortest path from i to k with all intermediate vertices in the set {1, 2, . . . , k - 1}. Similarly, p2 is a shortest path from vertex k to vertex j with all intermediate vertices in the set {1, 2, . . . , k - 1}.

## A recursive solution to the all-pairs shortest-paths problem

Based on the above observations, we define a different recursive formulation of shortest-path estimates than we did in Section 26.1. Let be the weight of a shortest path from vertex i to vertex j with all intermediate vertices in the set {1, 2, . . . , k}. When k = 0, a path from vertex i to vertex j with no intermediate vertex numbered higher than 0 has no intermediate vertices at all. It thus has at most one edge, and hence . A recursive definition is given by

#### (26.5)

The matrix gives the final answer-- for all i, j V--because all intermediate vertices are in the set {1, 2, . . . , n}.

## Computing the shortest-path weights bottom up

Figure 26.4 shows a directed graph and the matrices D(k) computed by the Floyd-Warshall algorithm.

The running time of the Floyd-Warshall algorithm is determined by the triply nested for loops of lines 3-6. Each execution of line 6 takes O(1) time. The algorithm thus runs in time (n3). As in the final algorithm in Section 26.1, the code is tight, with no elaborate data structures, and so the constant hidden in the -notation is small. Thus, the Floyd-Warshall algorithm is quite practical for even moderate-sized input graphs.

## Constructing a shortest path

There are a variety of different methods for constructing shortest paths in the Floyd-Warshall algorithm. One way is to compute the matrix D of shortest-path weights and then construct the predecessor matrix from the D matrix. This method can be implemented to run in O(n3) time (Exercise 26.1-5). Given the predecessor matrix , the PRINT-ALL-PAIRS-SHORTEST-PATH procedure can be used to print the vertices on a given shortest path.

We can compute the predecessor matrix "on-line" just as the Floyd-Warshall algorithm computes the matrices D(k). Specifically, we compute a sequence of matrices (0), (1), . . . , (n), where = (n) and is defined to be the predecessor of vertex j on a shortest path from vertex i with all intermediate vertices in the set {1, 2, . . . , k}.

We can give a recursive formulation of . When k = 0, a shortest path from i to j has no intermediate vertices at all. Thus,

#### (26.6)

For k 1, if we take the path , then the predecessor of j we choose is the same as the predecessor of j we chose on a shortest path from k with all intermediate vertices in the set {1, 2, . . . , k - 1}. Otherwise, we choose the same predecessor of j that we chose on a shortest path from i with all intermediate vertices in the set {1, 2, . . . , k - 1}. Formally, for k 1,

#### (26.7)

We leave the incorporation of the (k) matrix computations into the FLOYD-WARSHALL procedure as Exercise 26.2-3. Figure 26.4 shows the sequence of (k) matrices that the resulting algorithm computes for the graph of Figure 26.1. The exercise also asks for the more difficult task of proving that the predecessor subgraph G,i is a shortest-paths tree with root i. Yet another way to reconstruct shortest paths is given as Exercise 26.2-6.

## Transitive closure of a directed graph

E* = {(i, j) : there is a path from vertex i to vertex j in G} .

One way to compute the transitive closure of a graph in (n3) time is to assign a weight to 1 to each edge of E and run the Floyd-Warshall algorithm. If there is a path from vertex i to vertex j, we get dij < n. Otherwise, we get dij = .

There is another, similar way to compute the transitive closure of G in (n3) time that can save time and space in practice. This method involves substitutions of the logical operations and for the arithmetic operations min and + in the Floyd-Warshall algorithm. For i, j, k = 1, 2, . . . , n, we define to be 1 if there exists a path in graph G from vertex i to j with all intermediate vertices in the set {1, 2, . . . , k}, and 0 otherwise. We construct the transitive closure G* = (V, E*) by putting edge (i, j) into E* if and only if = 1. A recursive definition of , analogous to recurrence (26.5), is

and for k 1,

#### (26.8)

As in the Floyd-Warshall algorithm, we compute the matrices in order of increasing k.

`TRANSITIVE-CLOSURE(G)`

Figure 26.5 shows the matrices T(k) computed by the TRANSITIVE-CLOSURE procedure on a sample graph. Like the Floyd-Warshall algorithm, the running time of the TRANSITIVE-CLOSURE procedure is (n3). On some computers, though, logical operations on single-bit values execute faster than arithmetic operations on integer words of data. Moreover, because the direct transitive-closure algorithm uses only boolean values rather than integer values, its space requirement is less than the Floyd-Warshall algorithm's by a factor corresponding to the size of a word of computer storage.

In Section 26.4, we shall see that the correspondence between FLOYD-WARSHALL and TRANSITIVE-CLOSURE is more than coincidence. Both algorithms are based on a type of algebraic structure called a "closed semiring."

## Exercises

As it appears above, the Floyd-Warshall algorithm requires (n3) space, since we compute for i, j, k = 1, 2, . . . , n. Show that the following procedure, which simply drops all the superscripts, is correct, and thus only (n2) space is required.

#### Figure 26.5 A directed graph and the matrices T(k) computed by the transitive-closure algorithm.

`FLOYD-WARSHALL'(W)`

`1  n  rows[W]`

`2  D  W`

`3  for k  1 to n`

`4       do for i  1 to n`

`5              do for j  1 to n`

`6                 dij  min(dij, dik + dkj)`

`7  return D`

Modify the FLOYD-WARSHALL procedure to include computation of the (k) matrices according to equations (26.6) and (26.7). Prove rigorously that for all i V, the predecessor subgraph G, i is a shortest-paths tree with root i. (Hint: To show that G,i is acyclic, first show that implies . Then, adapt the proof of Lemma 25.8.)

Suppose that we modify the way in which equality is handled in equation (26.7):

Is this alternative definition of the predecessor matrix correct?

Another way to reconstruct shortest paths in the Floyd-Warshall algorithm uses values for i, j, k = 1, 2, . . . , n, where is the highest-numbered intermediate vertex of a shortest path from i to j. Give a recursive formulation for , modify the FLOYD-WARSHALL procedure to compute the values, and rewrite the PRINT-ALL-PAIRS-SHORTEST-PATH procedure to take the matrix as an input. How is the matrix like the s table in the matrix-chain multiplication problem of Section 16.1?

Give an O(V E)-time algorithm for computing the transitive closure of a directed graph G = (V, E).

Suppose that the transitive closure of a directed acyclic graph can be computed in (V, E) time, where (V, E) = (V + E) and is monotonically increasing. Show that the time to compute the transitive closure of a general directed graph is O((V, E)).

# 26.3 Johnson's algorithm for sparse graphs

1. For all pairs of vertices u, v V, a shortest path from u to v using weight function w is also a shortest path from u to v using weight function .

2. For all edges (u, v), the new weight is nonnegative.

As we shall see in a moment, the preprocessing of G to determine the new weight function can be performed in O(V E) time.

## Preserving shortest paths by reweighting

As the following lemma shows, it is easy to come up with a reweighting of the edges that satisfies the first property above. We use to denote shortest-path weights derived from weight function w and to denote shortest-path weights derived from weight function .

Given a weighted, directed graph G = (V, E) with weight function w: E R, let h: V R be any function mapping vertices to real numbers. For each edge (u, v) E, define

#### (26.9)

Let p = v0, vl, . . . , vk) be a path from vertex 0 to vertex vk. Then, w(p) = (v0, vk) if and only if . Also, G has a negative-weight cycle using weight function w if and only if G has a negative-weight cycle using weight function .

Proof We start by showing that

#### (26.10)

We have

The third equality follows from the telescoping sum on the second line.

We now show by contradiction that w(p) = (v0, vk) implies . Suppose that there is a shorter path p' from v0 to vk using weight function . Then, . By equation (26.10),

which implies that w(p') < w(p). But this contradicts our assumption that p is a shortest path from u to v using w. The proof of the converse is similar.

Finally, we show that G has a negative-weight cycle using weight function w if and only if G has a negative-weight cycle using weight function . Consider any cycle c = <v0, v1, . . . , vk>, where v0= vk. By equation (26.10),

and thus c has negative weight using w if and only if it has negative weight using

## Producing nonnegative weights by reweighting

Our next goal is to ensure that the second property holds: we want (u, v) to be nonnegative for all edges (u,v) E. Given a weighted, directed graph G = (V, E) with weight function : E R, we make a new graph G' = (V', E'), where V' = V {s} for some new vertex s V and E'= E {(s, ): v V}. We extend the weight function w so that (s,v) = 0 for all v V. Note that because s has no edges that enter it, no shortest paths in G', other than those with source s, contain s. Moreover, G' has no negative-weight cycles if and only if G has no negative-weight cycles. Figure 26.6(a) shows the graph G' corresponding to the graph G of Figure 26.1.

Now sup