CRISP distributed caching platform (a free download is available).

In the CRISP (Caching and Replication for Internet Service Performance) project, we investigate architectures and caching strategies for distributed caching. This is a collaboration with Syam Gadde and Jeff Chase of Duke University.

Proxy caching servers are widely used to improve performance of internet access. Unfortunately, they often become severe bottlenecks and hurt more than help the performance. Partitioning users among multiple proxies alleviates this problem at the cost of reduced sharing of caches (indeed, a user connected to one proxy cannot benefit from the cache maintained by another proxy). We propose a distributed architecture for a Web proxy system, which avoids the proxy bottleneck and increases its effective cache size without compromising the level of cache sharing.

The architecture contains two sets of servers: caching servers and mapping servers. Caching servers keep caches of objects, and mapping servers maintain the map of URLs to the IDs of the caching servers (if any) that have the corresponding objects. The namespace of all URLs is partitioned among mapping servers according to some well-known partitioning rule, so that each mapping server is responsible for some portion of URL namespace.

A client (browser) is normally connected to one of the caching servers as its proxy. When the caching server receives a client request, it either finds the requested object in its cache, or sends the request to a mapping server, chosen according to the partitioning rule. If the mapping server finds the URL in its map database, it returns the ID of a server that caches this object to the requesting caching server, which uses this information to obtain the object. (Note that an object may be cached by several caching servers, in which case the mapping server chooses one according to some policy.) If the object is not in the mapping server database, the caching server obtains and caches the object from the Internet, and informs the mapping server; the mapping server adds the location of the newly cached object to its mapping database.

Thus, the load is distributed among caching servers and mapping servers, and at the same time an object cached anywhere in the system is available to all users. An interesting configuration of this system is the one where caching proxies are co-located with the browsers, each browser is proxied to its local caching server, and the browser cache size is set to zero. In this case, the architecture converges to the one where all clients share their caches, which not only increases greatly the effective cache size, but also makes it scale with the number of users.

The architecture has been implemented by modifying the Squid proxy server and is available for download .

Publications

M. Rabinovich, S. Gadde, and J. Chase. Not all hits are created equal: cooperative proxy caching over a wide-area network. 3rd International WWW Caching Workshop. June 1998.

The paper describes a scalable model for cooperation between proxy caches in a global network.

S. Gadde, M. Rabinovich, and J. Chase. Reduce, Reuse, Recycle: An Approach to Building Large Internet Caches. Workshop on Hot Topics in Operating Systems (HotOS), May 1997.
This paper describes and motivates the arcutecture of CRISP cache.

S. Gadde, J. Chase, and M. Rabinovich. A Taste of Crispy Squid. Workshop on Internet Server Performance. June 1998.

S. Gadde, J. Chase, and M. Rabinovich. Directory Structures for Scalable Internet Caches. Technical Report CS-1997-18. Department of Computer Science, Duke University. November 1997.
This paper compares performance of CRISP and Harvest caches and discusses alternatives for managing the directory service for cooperative caches.