realserver logs

Saturday, March 24, 2001, at 06:08AM

By Eric Richardson

So today I'm taking a little break from eThreads work to mess around with RealServer log parsing some. A while back I wrote some scripts for parsing the log files we had at Gospelcom, so I've got a good start. Click to see the kind of stuff this means I have to deal with...

First off, RealServer logs a ton of data. I love it for that, but it also makes it a big pain to comprehend. Take a look at a typical log line:


216.206.161.5 - - [03/Jan/2001:14:03:42 -0500] "GET live/wayfm.rm? RTSPT/1.0" 200 0 [Win98_4.10_6.0.9.357_plus32_SP80_en-US_586] [99125741-e1a0-11d4-f946-8dcaefb2247a] [Stat3:549|0|STOP|;] 0 0 0 0 0 3942


That's nothing, though... Look at the regular expression I had to come up with just to parse this stuff:


m!^(\S+) - - [(\d\d)/(\w+)/(\d\d\d\d):(\d\d):(\d\d):(\d\d) -0\d00] "GET (\S+) ([^"]+)" \S+ (\S+) [([^]]+)] [([^]]+)] (?:[Stat1: ([^]]+)])?\s?(?:[Stat2: ([^]]+)])?\s?(?:[Stat3:([^]]+)])?\s?(?:[UNKNOWN])? (\S+) (\S+) (\S+) (\S+) (\S+) (\S+)!i;


That's a lot of regular expression. And then once I've got the data, I have to figure out the best thing to do with it. When you deal with the scale of data you can quickly get to when dealing in that much data per entry, efficiency becomes very important.


Right now I'm parsing the log file and dumping the data into a SQL table. From there I'll probably write a script to grab records out to a Storeable file so I can play with them there.