thoughts on architectures and caching

Thursday, June 03, 2004, at 10:22AM

By Eric Richardson

99% of you are going to want to ignore this post. I need to write something out, and this is convenient, so I'm doing it here. It's technical, though, so if you're not in to that feel free to do something else. Maybe you want to read it to see how oddly my mind works, but I doubt it. That's why I'm hiding all the content after the jump. Click on if you really want to see it.

Ok, so my thoughts today are on some eThreads work I've been doing. I'm rewriting the whole thing (well, borrowing a lot of old code isn't quite rewriting, but whatever, the architecture's new) to run as a mod_perl handler under Apache 2. I've had this funky idea lately to break the whole page rendering process into three steps. These would be handled by different handlers, allowing different steps to optimize their caching, etc.

Step 1: Figure out what container they want and what look we're using. Parse in whatever templates they want to include (maybe a header and a footer?).

Step 2: Parse in glomule data. Glomules are collections of data. A template can include data from more than one glomule. Look for glomule tags, replace them with data. Also run any plugins that affect the glomule itself.

Step 3: Run any non data affecting plugins. For instance the recent comments one in the right sidebar.

So anyway, that's all background. What's on my mind right now is how do you get your template from step 1 to step 2 to step 3?

In the current eThreads model you register handlers for template tags. Those handlers could be just flat data, or they could be routines that do something and then return the results. You register everything and then start the template processor. The template gets parsed sequentially, with pieces just printed out as they're done. Everything's done in one swipe, so you know that'll be fine.

But with the new container/glomule model, that's not going to be the case. Now the template's going to get to determine what glomule we're working on, what function we're running (basically what data's getting pulled), etc. So the model isn't going to work quite the same.

And that's before you even throw in the new three step process.

Let's say you start out with this simple template:

{template "header"/}

{blog 
  glomule   => "rantings",
  function  => ""
}
  ...do something...
{/blog}

{template "footer"/}

Step 1 is going to handle the {template} tags, putting the header/footer content in their place. The {blog} tag is going to be handled by step 2, though, so step 1 needs to ignore it and let it pass through to the next step.

There are a couple ways to go about that:

Complete Steps: One way would be to have each step operate on a complete template. Step 1 parses what it knows and then prints all the {blog}...{/blog} stuff back in because it doesn't know what to do with that. The complete page (still with {blog} template code) then goes out of step 1 and into step 2 where it's again picked apart into elements. Step 2 then handles what it knows and passes whatever it doesn't off to step 3. Pros? Easy and clean. Cons? Well, you're going text -> parsed tree -> text -> parsed tree -> text -> parsed tree -> text. I think you can see how there's a little inefficiency there.

Pass Perl: Another solution would be to pass around some way of representing the template in perl, which step 2/3 could then just eval back into memory. Then at the very end of the chain you run some sort of cleanup process that makes sure you're actually sending the viewer real HTML. Pros? Lots of potential to be fast. Cons? Another step, another thing to go wrong.

I'm sure there are others, but my mind's moving on, so my fingers will too.

Obviously looking at that problem begs the question, why break it up into three handlers? Why not do it the way things are done now, in one fell swoop?

First of all, the process has to change to accomodate moving functionality specs into the template.

Moving to logically separated steps gives you some fun flexibility. For instance, step 1 has to do a very simple, defined job: It figures out what template we're using. How much can you optimize that job to your specific setup? Maybe you run only on one machine and your number one concern is speed. So maybe you rework TemplateFrontEnd to use all flat files and to contain as little logic as possible. That's a lot easier when you're working in multiple steps.

I think maybe what I need to do is just optimize the template parser to ignore tags that it doesn't see registered. Right now it doesn't care. It's going to take that template and build it into a tree, and it really doesn't care what's in it. But maybe it should. Then when step 1 sees {blog} it can be like "ummm, don't care" and not even see anything inside that enclosure. Maybe at that point template parsing is fast enough that the reparse doesn't even matter. Right now, on a PII 400, I can parse the most complicated template I've got into a tree about 66 times in a second. Your template will get more simple as each step progresses (more HTML, fewer tags), but let's say that means that pretending for a moment that all other operations don't take time, the max that PII 400 could serve eThreads requests is around 20/sec. I think I can live with that.

ANd the point of this was... I have no clue.