Hack 77. Build a Client-Side Cache

Cut server traffic and improve performance by saving previously retrieved data.

Browsers know how to cache entire web pages. Often, when requesting a web page that you've recently visited, your browser saves time by grabbing the page from a local cache stored either in memory or on your hard drive.

However, Ajax applications often change only parts of a web page. The browser doesn't cache this data, but your Ajax application can.

A good client-side cache needs to make it easy to store new data, and simple to find and retrieve it later. An associative array is the simplest JavaScript structure to provide both easy storage and retrieval.

An associative array (often called a hash or hash table) is like a normal array, with one important difference: a normal array is indexed by integers, whereas an associative array can be indexed by arbitrary text strings. These text indexes are called hash keys.

For example, a normal array may have an assignment like this:

Arr[5] = "some text";

while an associative array can have assignments like this:

Arr["Charles Dickens"] = "Tale of Two Cities";

Luckily for you, JavaScript allows you to index arrays in either form.

When you talk to a server with XMLHttpRequest, you provide two pieces of information: the URL of the server script, and any parameters the script needs. Combining these makes an excellent array index. Whenever you retrieve data from the server for your Ajax application, you can save that data in an associative array using the URL and parameters as a key. The next time the user makes a request for the same URL, with the same parameters, you check for a matching entry in your array. If it's there, use the data you already have, rather than bothering the server again.

In the onreadystatechange function, you can save the data like this:

cache[url + parameters] = httpreq.responseText;

Later, before making another request to the server, check if a hash key is already defined in the cache array:

if (cache[url + parameters]) {
    response = cache[url + parameters];
else {
    //Not cached, so call server
    ...
}

Building a Better Cache

There's one problem with this approach. While the Ajax program is running in the browser, each new, distinct request gets another entry in the cache array. For some applications, the JavaScript array can get quite large, eating up memory on the client machine.

The solution is to modify the hack to limit the size of the array. You can just start deleting elements in the array once it's reached some arbitrary size, but how do you decide which elements to delete? One approach is to use a Least Recently Used (LRU) algorithm. We'll keep our cache in an ordered list (see Figure 9-7), where the top of the list contains the oldest element (the one used least recently), and the bottom of the list contains our freshest, newest data.

Figure 9-7. The LRU algorithm as a linked list

We'll still use an associative array, but we'll add some code to make it act like a linked list (also called a "queue"). A linked list is a set of data objects in which each element contains not just its data but also a pointer to the next data object in the list.

To show the LRU-based cache in action, let's look at the HTML code for the web page for a simple Ajax application that displays facts about planets in the solar system:

<HTML>

<HEAD>
<TITLE>Client-side Cache Test</TITLE>

<style>
    body,table,select { font-size: 12px; }
</style>

<script language="javascript" src="/cache_hack/xhr.js"></script>
<script language="javascript" src="/cache_hack/limited_cache.js"></script>
<script language="javascript" type="text/javascript">

function get_data (  ) {

    var lbPlanets = document.getElementById("lbPlanets");

    async_cmd ("/cgi-bin/cache/planets.cgi?", 
               "p=" + lbPlanets.value, 
               "divAnswer");

}

</script>

</HEAD>

<BODY>
<center>
<b>Client-side Content Caching</b><p>

<table style="border: 1px solid gray;" cellpadding="5" cellspacing="0" width="95%">

<tr>
    <td width="35%" bgcolor="#f0f0f0" valign="top">
        <form id="frmMain">
            Select a planet:
            <select id="lbPlanets" onChange="get_data(  );">
                <option value="mercury">Mercury</option>
                <option value="venus">Venus</option>
                <option value="earth">Earth</option>
                <option value="mars">Mars</option>
                <option value="jupiter">Jupiter</option>
                <option value="saturn">Saturn</option>
                <option value="uranus">Uranus</option>
                <option value="neptune">Neptune</option>
                <option value="pluto">Pluto</option>
            </select>
        </form>

        <div id="divAnswer">
        &nbsp;
        </div>
    </td>

    <td width="65%" bgcolor="#c0c0c0" valign="top">
        Cache Contents (oldest first):<p>
        <div id="divCacheContents">
            <b>Cache is empty</b>
        </div>
    </td>
</tr>

</table>

</center>
</BODY>

</HTML>

This creates the initial web page shown in Figure 9-8.

Figure 9-8. A client-side cache example

When a planet is selected from the pull-down select box, the JavaScript function get_data( ) is triggered. This function calls the function async_cmd( ), defined in the JavaScript file limited_cache.js, shown here:

var cache = new Array;

var top_key = null;
var prev_key = null;
var curr_cache_size = 0;

var MAX_CACHE_SIZE = 5;

//----------------------------------------------------------
// Display the contents of the client-side cache in a
// DIV tag
//----------------------------------------------------------
function show_cache_info (answer_from) {
    var divCache = document.getElementById("divCacheContents");
    divCache.innerHTML = "";
    var curr_key = top_key; 
    while (curr_key != null) {
        divCache.innerHTML = divCache.innerHTML 
                + "KEY: <b>" + curr_key 
                + "</b> VALUE: <b>" + cache[curr_key].value 
                + "</b><br>";
        curr_key = cache[curr_key].next;
    }

    divCache.innerHTML = divCache.innerHTML
            + "<p>Last answer retrieved from: <b>"
            + answer_from + "</b>";
}

//----------------------------------------------------------
// Asynchronous (non-blocking) server query
//----------------------------------------------------------
function async_cmd (url, parms, divname) {
    var httpreq = getHTTPObject(  );

    var divAnswer = document.getElementById(divname);

    //Precondition: must have a URL
    if (url == "") return;

    var cache_key = url + parms;

    //If this is a cacheable request, then first 
    //check if a response already exists for it

    if (cache[cache_key]) {
        divAnswer.innerHTML = "Answer: <b>" +
         cache[cache_key].value + "</b>";

        //Linked-list maintenance
        if (cache_key != prev_key) {
            var curr_key = top_key;

            if (cache_key != top_key) {
                //Find linked-list node preceding the 
                //cache[cache_key] node
                while (cache[curr_key].next != cache_key) {
                    curr_key = cache[curr_key].next;
                }
            }
            else {
                top_key = cache[top_key].next;
            }
            //Point preceding node to point to which 
            //cache[cache_key] currently points
            cache[curr_key].next = cache[cache_key].next;

            //Move cache[cache_key] to the end of our
            //linked list
            cache[prev_key].next = cache_key;
            cache[cache_key].next = null;
            prev_key = cache_key;
        }
        show_cache_info ("client-side cache");
    }
    else {
        //Send request to server
        httpreq.open("POST", url, true);

        //--------------------------------------------------
        // Response function
        //--------------------------------------------------
        httpreq.onreadystatechange = function (  ) {
            if (httpreq.readyState == 4) {
                var response = httpreq.responseText;

                if (curr_cache_size >= MAX_CACHE_SIZE) {
                    //Remove oldest item from cache
                    var oldest = top_key;
                    top_key = cache[oldest].next;
                    delete cache[oldest];

                }
                else {
                    curr_cache_size++;
                }

                //Linked-list maintenance
                if (top_key == null) {
                    top_key = cache_key;
                }
                if (prev_key != null) {
                    cache[prev_key].next = cache_key;
                }

                //Add answer we just retrieved into cache
                cache[cache_key] = { value:response, next:null };
                prev_key = cache_key;

                //Display answer in DIV tag
                divAnswer.innerHTML = "Answer: <b>" + response + "</b>";

                show_cache_info ("server");
            }
        }
        httpreq.send (parms);
    }
}

The async_cmd( ) function expects three parameters. The first two, url and parm, are the URL and parameters that make the XMLHttpRequest call. As the function name implies, the server call is asynchronous. When the server reply is received, the reponseText is copied into the HTML div tag specified by the third parameter, divname.

The async_cmd( ) function also contains all the code for storing and retrieving data from the client-side cache. You don't need to know anything about how the cache works (or even that it exists) to use the function.

The Cache in Action

In the example program, a user selects a planet from the pull-down list, triggering the JavaScript function get_data( ), which in turn calls the function aync_cmd( ). The job of this function is to request information from the server and use it to update a div tag. But before it bothers the server, it first builds a cache keyan index into the cache arrayby combining the URL and parameters. The program then checks the cache for a matching entry:

var cache_key = url + parms;

if (cache[cache_key]) {
    divAnswer.innerHTML = "Answer: <b>" + cache[cache_key].value + "</b>";

If a match is found, you use the data from the cache to update the page.

The cache value in the previous code is referenced as cache[cache_key].value. To turn the cache array into a linked list, you have to give each array element two properties. The first, value, holds the cached data. The second, called next, is a pointer to the index of the next item in the linked list.

When no match is found, we have to contact the server as usual, but after the data's been retrieved, we need to insert it into our cache. We're using our array to simulate a linked list, and we must also be careful to keep the list in LRU order and to make sure the list doesn't grow too big. The size of the cache is controlled by the variable MAX_CACHE_SIZE, which in our example is set to 5. In practice, the size of the cache depends on the needs of the application.

Our example program includes a function, show_cache_info( ), that displays the contents of the cache on the right side of the web page. This display is refreshed every time a new planet is selected. Figure 9-9 shows the application after the five outermost planets have been selected (starting with Pluto and moving inward).

Figure 9-9. After five selections, the cache is now full

The oldest (least recent) item in our cache is Pluto, and the newest item is Jupiter. Because five items have been selected, there's no more room in the cache. When the next planet, Mars, is selected, our aync_cmd( ) function needs to do some cache rearranging. The results are shown in Figure 9-10; Pluto, the oldest entry in the cache, is deleted to make room for Mars.

Figure 9-10. Pluto gets deleted to make room for Mars

Up to now, we've only rearranged the contents of the cache. Now let's see what happens when we select Saturn, an item inside the cache. The results are shown in Figure 9-11.

Figure 9-11. Saturn gets pulled from the cache

Saturn has moved from third position in the cache to the bottom, maintaining the list in Least Recently Used order. Using the cache also saves a trip to the server, which makes our application more responsive.

Hacking the Hack

If you want a cache size larger than five (and who doesn't?), you might consider adding a function to set the initial cache size. You can call it from your Ajax page's onLoad event handler.

In the async_cmd( ) function, when we're pulling a value from our cache, we need to find the list element that precedes our target one. We're moving the target element to the end of the list, and we need to splice the list back together. Our code to find the preceding element looks like this:

while (cache[curr_key].next != cache_key) {
    curr_key = cache[curr_key].next;
}

The code is simple, but it's woefully inefficient. Performance may suffer if the cache is very large. To remedy this problem, consider rewriting the cache as a doubly linked list. In a doubly linked list, each element points not just to the next element, but to the previous one as well.

There are cases in which you don't want to cache server data on the client. If you're receiving data that changes over time (for example, the current temperature in Denver), you'll always want to get this data from the server. Consider adding a parameter to async_cmd( ) to disable caching in these special cases.

Mark Pruett