Raw Numbers Mean Nothing
Tuesday, June 21, 2005, at 01:38PM
By Eric Richardson
This morning Sean linked to a a comparison Triston Louis did of how Google and Yahoo report links to blogs.
Tristan takes a look at the number of links Yahoo and Google and Technorati report as pointing to a site and uses that to infer how well each engine is doing in covering blogs.
I skimmed it this morning, and then just went back and had a nice back and forth with Sean about it. My contention: the comparison is worthless.
Tristan's data shows that generally Google only reports about 3% as many links to the Technorati top 100 blogs as Yahoo does. For my statistically insignificant blogs the difference is varied: blog.ericrichardson.com shows 400 links in google, and 2,760 in yahoo (14.4%); blogdowntown shows 222 and 28,800 (0.7%).
The question is whether those 28,800 "links" Yahoo tells me about actually mean anything.
It would seem that they don't, since 24,800 of them come from LA Voice. Now, I appreciate that Mack links me in the sidebar. But why do I care that Yahoo can find close to 25,000 permutations of LA Voice URLs that happen to have my link on them? The problem with dynamic content is that there are a near infinite number of ways to access the same information. Back when we were all writing HTML there were a certain number of "pages" on a site. They were files. Today sites that are dynamic don't have a conception of "pages." If you look at the archives, blogdowntown has 294 posts. But who knows how many different URL combinations might allow access to those same pieces of information?
I can create a site that has 50,000 "pages" but very little content just as easily as I can create a site that has 100 pages of good content (well, probably easier... 100 pages of good content takes time). To have the former site indexed more fully only increases the noise in the index.
There's no assurance that more entries in the index means that an engine is hitting more information. And in the end, that's what matters: information. I don't care about raw numbers -- ever. Raw numbers are worthless. Raw "link" counts are worthless. They might be interesting to look at, but I would say that they have no connection to the reality of how comprehensively any engine is indexing the web. The dynamic reality of most blog software only exagerates this disconnect.