be seen be sold - promoting software since 1997.
Shareware Promotions      
 
Main blog page : February 2005 : The Importance of Log Analysis part V - understanding the data

The Importance of Log Analysis part V - understanding the data
February 28, 2005
 
Friday's posting in the Importance of Log Analysis series looked at what to actually look for in your log analysis. Since writing the posting, I've been inundated by emails asking for more practical and actionable items. Okay so there were only three emails, but it appears that more examples were required.

Before we look at these, I first want to look at a few important facts to take into account.

Too much data?

Most log analysis applications will provide you with a massive amount of information.

It's important to understand that not all of the information is useful, and sometimes the terminology used can be misleading.

The most common example is the number of hits.

The number of hits to a page does not mean the number of visitors.

If a simple HTML page consists of a few paragraphs of text and two graphic images, then one person viewing this page (one time only) will count as three hits. 1 page + 2 images = 3 hits.

So in a more realistic example, if an html page has 15 images, one person viewing the page one time only will count as 16 hits. You get the idea.

Page Views is a more accurate figure, as this counts the number of times that separate pages are viewed. So one person viewing three pages = three page views.

However, this too can be misleading.

15 page views could consist of one person looking at 15 pages. It could also be 15 different people only looking at one of your pages. It could also, depending on how their browser and your server is set up, consist of one person viewing a small number of pages several times, and going back and forth between them.

Another example is the number of visitors. These are usually defined by IP addresses, but different applications will have different definitions. Application one may recognise my IP address, and for every single time I come back to the website, will only count me as the one visitor. Application two may define a set period of time, after which I will count as a second visitor. Application three may differentiate between visitors and unique visitors and so on.

So to understand what you're looking at, you need to understand the terminology, understand how your server works, and also understand how your log analysis software is set up.

Different analysis software, different results.

Here's an interesting fact. If you take one month's worth of log files, and run them through five different log analysis applications, you'll get five different sets of figures. None will be a perfect match with each other. Some will be quite close to each other, but some of them will differ considerably.

The question is what to do about it, and how to know which ones to trust?

My own solution is a simple one. I know from experience which ones more or less agree with each other, and can safely assume that these are therefore more or less accurate.

I can also assume that the irregularities displayed by any of the applications will be more or less consistent. In other words the figures may not be 100% accurate, but they'll be close enough. And if I keep using the same application with the same server's log files, I can rely on what they're telling me.

Tomorrow we'll take a look at some practical and actionable things you can do with your log analysis data.

Permanent link

Main blog page
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004