Every now and then, like every insecure self-respecting blogger I wonder: How the hell do I find out how many people are actually reading my blog, consarn it!
Yes, this is another blog entry related to blogging
Numbers reported by services like Bloglines don’t seem so helpful since a) they only represent a fraction of readers and b) they only report subscribers, not active readers– I imagine a lot of people stop using such services after a short time, or only log in occasionally.
Aggregated web stats (as provided by internet host) aren’t too helpful either, as the numbers vary wildly depending on what is considered a hit and what is considered a visit (the latter is the number that counts).
I do have a rough idea of how many people visit this site, based on a counter which is fairly pessimistic compared to most because it is based only on background image loads– these are a lower priority for browsers and will tend to generate fewer hits than normal images (often you have to do a "forced" reload to see an updated background image). This counter tells me that I currently get around 120 visitors per day, but still that doesn’t really tell me how many different people visit, say, over the course of a week. Nor does it tell me how many people have read a particular post; it just means they’ve hit a page somewhere on my site.
And so I realize that the only way to get an accurate sampling is to analyze the server logs directly. This is a pain, but it can also be very informative.
Totally anal method for estimating blog readership:
- Locate and download a recent server log from your web host. I can access mine as monthly gzipped archives, so I downloaded June.
- Pick an entry containing an image, that was posted two weeks before the end of that log (any longer than that means the post is probably a little stale anyway). I chose this one, an unremarkable post which happens to consist of a single image.
- Search the log for occurences of the image file name, in this case 555.gif. I count 486, which means an absolute maximum of 486 people viewed that post within two weeks.
- Note that many of the accesses are in fact [http code] 304′s, which means they are caused by people refreshing their browsers or returning to a page, so ignore those and count only the 200′s. That reduces my number to 323.
- Note that some of these loads were not complete [this can be checked by looking at the byte count], which means they were probably cancelled or interrupted, so ignore these as well. My total is now 308.
- Now have a look at the IP addresses, and be dismayed at how many duplicates there are, despite all this culling! Be ruthless and assume that any addresses starting with the same 3 numbers (ie using the same class C address range) are probably from the same person, and ignore those as well. This leaves me with a final tally of 217.
So now I think I can be reasonably confident that around 200 individuals will see a given entry– and furthermore, I can now use these results to work out what proportion of Bloglines subscribers actually saw that post via Bloglines, by checking the referrers. Here’s the referrer breakdown for those remaining 217 accesses:
| Referrer (how entry was viewed) |
|
| intepid.com |
179 |
| Unknown (no referrer info) |
21 |
| Bloglines (reports 47 subscribers) |
15 |
| Google Reader |
2 |
| NewsGator (reports 13 subscribers) |
0 |
What’s surprising here is the large discrepancy between people who read via browsing and those who read via feeds– according to these numbers less than 20% of people who saw that entry were using an aggregator or news reader.
A final note: as painstaking as this process sounds, it doesn’t actually take that long if you’re comfortable with text find & replace operations. Writing this entry took a lot longer than doing the calculations.