Friday's posting in the Importance of Log Analysis series looked at what to actually look for in your log analysis. Since writing the posting, I've been inundated by emails asking for more practical and actionable items. Okay so there were only three emails, but it appears that more examples were required.Before we look at these, I first want to look at a few important facts to take into account.
Too much data?
Most log analysis applications will provide you with a massive amount of information.
It's important to understand that not all of the information is useful, and sometimes the terminology used can be misleading.
The most common example is the number of hits.
The number of hits to a page does not mean the number of visitors.
If a simple HTML page consists of a few paragraphs of text and two graphic images, then one person viewing this page (one time only) will count as three hits. 1 page + 2 images = 3 hits.
So in a more realistic example, if an html page has 15 images, one person viewing the page one time only will count as 16 hits. You get the idea.
Page Views is a more accurate figure, as this counts the number of times that separate pages are viewed. So one person viewing three pages = three page views.
However, this too can be misleading.
15 page views could consist of one person looking at 15 pages. It could also be 15 different people only looking at one of your pages. It could also, depending on how their browser and your server is set up, consist of one person viewing a small number of pages several times, and going back and forth between them.
Another example is the number of visitors. These are usually defined by IP addresses, but different applications will have different definitions. Application one may recognise my IP address, and for every single time I come back to the website, will only count me as the one visitor. Application two may define a set period of time, after which I will count as a second visitor. Application three may differentiate between visitors and unique visitors and so on.
So to understand what you're looking at, you need to understand the terminology, understand how your server works, and also understand how your log analysis software is set up.
Different analysis software, different results.
Here's an interesting fact. If you take one month's worth of log files, and run them through five different log analysis applications, you'll get five different sets of figures. None will be a perfect match with each other. Some will be quite close to each other, but some of them will differ considerably.
The question is what to do about it, and how to know which ones to trust?
My own solution is a simple one. I know from experience which ones more or less agree with each other, and can safely assume that these are therefore more or less accurate.
I can also assume that the irregularities displayed by any of the applications will be more or less consistent. In other words the figures may not be 100% accurate, but they'll be close enough. And if I keep using the same application with the same server's log files, I can rely on what they're telling me.
Tomorrow we'll take a look at some practical and actionable things you can do with your log analysis data.