Logging in JSON

Thursday, April 05, 2012, at 01:12AM

By Eric Richardson

I happened to wander past the node.js blog yesterday and read this article on Bunyan, a logging framework built around JSON.

It was a fortuitous coincidence, since I was looking for ideas on how best to implement logging inside StreamMachine, the rewindable streaming audio server I've been playing with at KPCC.

Bunyan's toolset isn't quite fleshed out at the moment, but the idea is that logging loosely-structured data as JSON is a heck of a lot more functional than squashing everything into strings.

So instead of a typical Shoutcast w3c log line:

96.10.155.166 - - [04/Apr/2012:07:11:50 -0700] "GET /kpcc HTTP/1.1" 200 219606 "-" "VLC/2.0.1 LibVLC/2.0.1" 85

StreamMachine can now log a connection as:

{
"name": "StreamMachine",
"hostname": "dev2.kpcc.org",
"pid": 18605,
"source": "kpcclive",
"level": 20,
"req": {
    "method": "GET",
    "url": "/kpcclive.mp3?socket=11516610901102892326",
    "headers": {
        "host": "scprdev.org:8000",
        "connection": "keep-alive",
        "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11",
        "accept": "*/*",
        "referer": "http://www.scpr.org/assets-flash/streammachine.swf",
        "accept-encoding": "gzip,deflate,sdch",
        "accept-language": "en-US,en;q=0.8",
        "accept-charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.3"
    }
},
"listeners": 0,
"msg": "Connection end",
"time": "2012-04-05T11:49:38.871Z",
"v": 0
}

(Ignore that I'm not actually including seconds listened or bytes sent in that log line...)

It's wordy, but if the computer is the only one reading, who cares?

I've been writing a lot of log-parsing tools lately, taking in Shoutcast logs and running stats on listening habits like time-of-day and duration. A structured JSON object gives me a lot more lee-way when it comes to adding and removing log data than a w3c-formatted log does because I no longer have to remember that user agent is the third match in my regex results, etc. Calling log.req['user-agent'] is nice and simple.

Given the headaches that trying to make sense from log data already present—and always will—that's a good thing.