Making Django and Rails Play Nice, Part 3: Caching

Friday, March 16, 2012, at 05:04AM

By Eric Richardson

This is the third section of a multi-part look at some of the issues that we faced in developing KPCC's new beta website, which is written in Ruby on Rails but runs side-by-side with the existing Django site. Part one looked at mapping generic relationships with MySQL views, while part two talked about adapting sessions to allow them to wander from one site to the other.

There are plenty of ways to do caching. Page caches, partial caches, timestamp-based caches, versioned caches... they've all got some place in valid place in the caching arsenal. In Rails, the convention is to build sweepers that get fired when model objects are saved, giving a spot to handle expiration or rebuilding of caches. But what do you do when Django needs to be the one to expire your Rails caches?

Django's default cache is built around memcache and expiration times, with a version number prefix that can be incremented to change keys and start over when content changes.

That was the setup when I first got to the station, but expiration-based caches have a number of consequences that are less than ideal. First, unless you are keeping track of versioning and updating that key prefix, you're always going to have a delay before the cache times out and updates show up on the site. Keep that timeout too low, though, and you'll expire caches unnecessarily. That creates a situation where most page views are fast, but every time the cache expires some unlucky visitor is going to get burned as we rebuild the entire page.

Last summer we switched that setup out, putting in a new system that attempts to do a cleaner job of expiring caches only when the content in them changes. Taking advantage of Redis and its sets data type, the system stashes a copy of the cache key in a set for each piece of content inside. When that piece of content is updated, the cache system expires every cache in that object's set.

While that move was made well before the plan to implement Rails, it turns out the system didn't need much change to power both systems at once. Make sure the cache keys are stored in full form with any framework-specific prefixing, and all of a sudden a content update in Django can expire caches put in place by Rails.

Listening in to the broadcast

Going a step further, what if you want a content update to trigger a cache rebuild, rather than leaving that action to trigger when a poor user hits an uncached view?

Again, Redis comes to the rescue. The database includes a Pub/Sub messaging system that allows an easy abstraction between publishers and subscribers.

In Django, I set up a post_save action on our content that triggers a PUBLISH action with a JSON object containing the content's unique key and some commonly-needed bits of metadata (what is the content's publish status? what action is this: a save, a publish, an unpublish?).

Using the worker listener in Resque as a model, I then set up a Rails process to sit and listen on that same channel. When it sees a change that should update the contents of the homepage, it fires a process that rebuilds our Sphinx caches and then caches new views.

Getting Fancy... What if you want Rails to cache for Django?

Want to get super-fancy? Use that Rails worker to cache back to Django.

It's actually not that hard, thanks to RubyPython:

if Rails.env == "production"
  RubyPython.start(:python_exe => "/usr/local/python2.7.2/bin/python")
else
  RubyPython.start()      
end

@pickle = RubyPython.import("cPickle")

Now you have a Ruby interface to Python's Pickle, the important piece you need to write caches that Django will be happy to open:

# if we're passed a pickle object, also perform django headlines caching
if pickle
  (Rails.cache.instance_variable_get :@data).set(
    ':1:hsection:headlines',
    pickle.dumps( view.render(
        :partial => "home/cached/django/headlines", 
        :object => scored_content[:headlines]) 
    ),
    :raw => true
  )
end

Nifty.

Rails Code

The Rails side of our caching code is available in the redis-content-store gem on SCPR's Github account. It includes the underlying caching code and simple view helpers for cache_content and register_content.

It's not perfect—my next step is to put some work into how nested caches would keep track of their content objects correctly—but it's working well for our purposes.

Django Code

The Django code can be found in this Gist:

from django.core.cache import cache
DEFAULT_TIMEOUT = 0
SET_PREFIX = "obj:"
FSET_PREFIX = "sobj:"
def set(key,val,timeout = DEFAULT_TIMEOUT,objects = []):
"""
Unlike the normal django / redis cache, ContentCache takes a collection
of objects that this cache depends on. This allows the cache to be
expired automatically when expire_obj() is called for one of those
objects. ContentBase models should automatically expire their caches
on save.
objects array can include stringified keys or objects that support a
obj_key() method (such as ContentBase models).
from contentbase import cache
cache.set(key,val,0,['news/story:21563',segmentObject])
"""
# first, set the object
cache.set(key,val,timeout)
# Make the full key
fullkey = cache.make_key(key)
# expire this key from existing sets
fset = cache._client.smembers( "%s%s"%(FSET_PREFIX,fullkey) )
if fset:
for skey in fset:
cache._client.srem(skey,fullkey)
# now add this key to each object's set
keys = []
for obj in objects:
if obj == None:
next
elif hasattr(obj,'obj_key'):
cache._client.sadd(SET_PREFIX+obj.obj_key(),fullkey)
keys.append(SET_PREFIX+obj.obj_key())
else:
cache._client.sadd(SET_PREFIX+obj,fullkey)
keys.append(SET_PREFIX+obj)
# create our forward mapping
cache._client.delete("%s%s"%(FSET_PREFIX,fullkey))
cache._client.sadd("%s%s"%(FSET_PREFIX,fullkey),keys)
#----------
def get(key, default=None, version=None):
return cache.get(key,default,version)
#----------
def expire_obj(obj):
"""
Expire all caches that depend on a given object.
from contentbase import cache
cache.expire_obj(obj)
obj can be a string or an object that supports a obj_key() method.
"""
key = obj
if hasattr(obj,'obj_key'):
key = obj.obj_key()
mem = cache._client.smembers( SET_PREFIX+key )
if mem:
cache._client.delete(*mem)
#----------
def expire_from_signal(sender, **kwargs):
expire_obj(kwargs['instance'])
view raw cache.py hosted with ❤ by GitHub
from django.template import Library, Node, TemplateSyntaxError, Variable, VariableDoesNotExist
from django.template import resolve_variable
from mercer.contentbase import cache
from django.utils.encoding import force_unicode
from django.utils.http import urlquote
register = Library()
class CacheNode(Node):
def __init__(self, nodelist, expire_time_var, fragment_name, vary_on):
self.nodelist = nodelist
self.expire_time_var = Variable(expire_time_var)
self.fragment_name = fragment_name
self.vary_on = vary_on
def render(self, context):
try:
expire_time = self.expire_time_var.resolve(context)
except VariableDoesNotExist:
raise TemplateSyntaxError('"cache" tag got an unknown variable: %r' % self.expire_time_var.var)
try:
expire_time = int(expire_time)
except (ValueError, TypeError):
raise TemplateSyntaxError('"cache" tag got a non-integer timeout value: %r' % expire_time)
# run nodelist looking for content objects
context.REGISTERED_OBJECTS = []
# Build a unicode key for this fragment and all vary-on's.
cache_key = self.fragment_name + ':'.join([urlquote(resolve_variable(var, context)) for var in self.vary_on])
value = cache.get(cache_key)
if value is None:
value = self.nodelist.render(context)
cache.set(cache_key, value, expire_time, context.REGISTERED_OBJECTS)
return value
#----------
class RegisterNode(Node):
def __init__(self, *args):
assert len(args)
self.args = list(args)
def render(self, context):
obj = resolve_variable(self.args[0],context)
context.REGISTERED_OBJECTS.append(obj)
return ''
#----------
def do_cache(parser, token):
nodelist = parser.parse(('endcache',))
parser.delete_first_token()
tokens = token.contents.split()
if len(tokens) < 3:
raise TemplateSyntaxError(u"'%r' tag requires at least 2 arguments." % tokens[0])
return CacheNode(nodelist, tokens[1], tokens[2], tokens[3:])
#----------
def register_content(parser,token):
"""
Registers a ContentBase object to the current cache.
Usage:
{% register_content <content obj> [<scheme>] %}
"""
return RegisterNode(*token.contents.split()[1:])
register.tag('cache_content', do_cache)
register.tag('register_content', register_content)
{% cache_content 0 hsection:national %}
<h3>National News</h3>
<ul class="related-links">
{% for item in national|slice:":3" %}
{% register_content item %}
<li><a class="news-link" href="{{ item.get_absolute_url }}">{{ item.headline }}</a>
{% endfor %}
</ul>
{% endcache %}