From Memcached to Redis to Surpdb

In this post, I'll describe my journey to find the perfect caching solution for my Django-based site linkrdr. After trying Memcached and Redis, I settled on surpdb. I guarantee you haven't heard of surpdb before, because I just finished writing it.

Querying and retrieving large data sets is not the Django ORM's sweet-spot. There are a number of reasons why Django's QuerySets struggle with large amounts of data, but the primary reason is because it wasn't designed to work in that way. For linkrdr, retrieving and analyzing a user's data became too much for all previous attempts at marrying Django QuerySets with some form of caching. Caching in Memcached to avoid going to the database on subsequent calls worked well except in one regard: the first request took so long that it would time out, resulting in a user never being able to see their data.

Enter Redis. Current sweetheart of the NoSQL movement, Redis is a key-value store that resides entirely in resident memory. In addition to simple strings, values can take the form of a number of Redis's native types: list, set, hash, etc. Redis is able to optimize on space for a number of these types, though there still is (sometimes significant) overhead for storing large numbers of small objects.

Installing Redis was easy. I headed over to the Redis site and followed the installation instructions. Since it's written in C, it requires a quick compilation (so make sure you have gcc installed) and you're ready to go. I changed the default redis.conf in the top-level directory and set daemonize yes so that redis would run in daemon mode. After that, a simple ./src/redis-server redis.conf started a new redis daemon listening on the default ports.

But how was I going to make Redis any more useful than Memcached? I needed to rethink my entire strategy when it came to structuring my data. The Django ORM encourages good database normalization. Usually, this is helpful for all the reasons that database normalization is a good thing. Because of this, however, determining the links a user should see in their list of links to read required joining across four very large tables and returning a hierarchy of objects three levels deep.

I took a deep breath and began to rethink my data strategy. While the normalized structure was useful for inserting and updating records, it poorly modeled the data I actually needed to return to the user. I decided that I would create a new type of object which would live only in the Redis cache. It was a denormalized version of my Link model that additionally contained all of the fields from other models needed to display the data to the user. In this way, I would be querying Redis (not my Postgresql database) for an object that already had all of the data I needed. To the Link model I added a to_dict() function that returned a dictionary of the data I needed. This was saved directly in a Redis hash object using the Link's id as the primary key.

Even then, however, I wasn't done. Redis was taking about 350MB to store my 560,000 Links. Since I use a Linode server, for which physical memory is always at a premium, to host linkrdr, this was less than ideal. When I looked at the size of Redis's backup file, I was dismayed to see it was only 163MB. This meant that over 50% of the space needed to store my data was Redis overhead, since if it can be stored in a 163MB file it can theoretically be stored in memory using that amount of space. Some back-of-the-envelope calculations showed even 163MB was far more space than my data should require. I should make one thing very clear: this isn't Redis's fault. Though I tried my best to optimize my data structure according to the Redis optimization guidelines, Redis was not designed from scratch to store and retrieve my data structure as efficiently as possible.

But what if something was designed from the ground up to store my data structure? I may only be a journeyman web programmer, but I've been dealing with these types of design issues in C and C++ for my entire professional life. Having previously optimized a Django view using C++, I was comfortable with the interaction between Python and C++. I decided I would write my own NoSQL db according to my requirements. And surpdb was born.

surpdb (__SU__per __R__eliable __P__ython __D__ata__B__ase) was built with a single goal in mind: minimize both retrieval time and memory usage for my data. Since I was only supporting storing data from Python, I could specialize the internal data structures to store Python objects rather than some abstract type. In addition, since I knew the surpdb instance would be running on the same machine as the Python code querying it, I could use shared memory for communication. This was attractive for a number of reasons. With shared memory, I didn't have to pay the cost associated with serializing an object, sending it over the wire, and deserializing and storing it. Rather, the Python process would store the object in a shared memory queue that surpdb would be watching. surpdb would then pick up the message and store it directly as a native Python object in its own shared memory data store. Using shared memory both for storage and communication also meant that I didn't need to give much thought to data backup. Since shared memory segments in Linux are represented as memory mapped files with a lifetime outside of that of the process that created them, the kernel would be responsible for asynchronously flushing my data to disk. In the event that surpdb crashed, recovery would be almost instantaneous as the snapshot of the database's address space before the crash was sitting there waiting to be attached to.

In surpdb, lookups are O(1) as one would expect. There is very little overhead for storing objects. In fact, due to the fact that I'm working with my own data, surpdb actually stores objects in less space than they would take in memory on their own. Since I know most of my data structure is storing different URLs, I'm able to precache lookup tables of common substrings and simply store an index on the object in memory. This saves a drastic amount of space. I was able to store my 560,000 Link objects in only 13MB. That's an order of magnitude gain on Redis by specializing for my data. More importantly, it means the site can grow quite large without me needing by upgrade my Linode with more RAM.

Update: Some Technical Details

A number of people have asked what data structure I'm using for hashing. The answer: none. As of an hour ago (when I realized I could do this), I removed the vector of pointers I was using and now rely on the integer key's value as an offset into shared memory (since my allocator guarantees all allocations are in contiguous memory). String storage is aggressively optimized (again, for my data). Domain names are stored in a separate table and the remaining portion of the string is compressed. Non-url strings are stored using LZMA compression. The obvious speed trade-off here was acceptable to me as I'm not responding to thousands of requests a second.

The end result of all of this is that I finally have a caching solution that meets my needs. Pages are dynamically generated in under a second even for thousands of links. I don't have to worry about more data requiring a machine upgrade (at least, not for a very long time). Also, I don't have to worry about upgrades, support, or learning the tricks of a new cache. I'm intimately familiar with the inner workings of surpdb (which make sense since I wrote it) and can optimize as I go. Lastly and most importantly, I no longer have to worry about the linkrdr infrastructure and can focus on improving the user experience.

Posted on Mar 05, 2012 by Jeff Knupp

Jeff Knupp

PYTHON PROGRAMMER

Everything I know about Python...

Learn to Write Pythonic Code!

Looking for Python Tutoring? Remote and local (NYC) slots still available! Email me at jeff@jeffknupp.com for more info.

From Memcached to Redis to Surpdb

Update: Some Technical Details

Everything I know about Python...

Learn to Write Pythonic Code!

Looking for Python Tutoring? Remote and local (NYC) slots still available! Email me at jeff@jeffknupp.com for more info.

Update: Some Technical Details

Like this article?