Every time any page is requested from your website by a human or another program or an automated bot, the event is tracked in a log file that is stored on the web server. If you have installed Google Analytics, then google will tell you all about the visitor and page analytics. It will tell you how many users, what are the top pages etc. In the case of Log File Analyzer, you can get similar insights into your visitor statistics and page analytics.
So what is the difference between Google Analytics and Log File Analysis?
What do you do with the information that a typical Log File Analyser generates?
The main difference is Google Analytics will track only the pages that have the GA code added to them. It will miss any information on other pages that does not have the GA code simply because it just cannot see them and won't be able to report on them.
What is Technical SEO and how does it relate to Log File Analysis?
Logs keep track of each and every event occuring on your website and webserver so this type of analysis is very accurate compared to any other method.
SEO is generally about the content of your website, how much content it has, what are the keywords, title etc. The term "Technical SEO" is used for all SEO activity that is not related to content of your website but still helps improve your website bottomline.
Some of the immediate insights any Technical SEO activity will reveal is
What are the most active page
Which pages are not known by Google search or Analytics
Is Google and other search engines Crawling regularly and how many pages are indexed by them
Uncover any security holes in your website
Understand Site architecture
How is the crawl budget spent by the search engine bots
and so on goes the list.
What is a Crawl Budget?
Every time Google or Bing visits your website, it will send an automated bot to scan and crawl your website. This bot will spend only certain amout of time and during this time it will crawl your website and check for any new content or pages. This limited amount of time that each bot spends is the "Crawl budget".
If there are too many errors, pages are loading slow and too many negative factors then obviously the search bots will not be visiting your website that frequently. If they don't spend more time, then chances are there that any new updates or new pages you add to the website or blog will not get indexed by Google or Bing. This makes Crawl Budget an extremely crucial component for site's architecture and hence Log File Analysis becomes the top priority for any SEO activity. Crawl Budget and Crawl deficiencies is uncovered only by analyzing your Apache or Nginx or IIS log files. Google Analytics or web master tools will tell you about your error pages but very little about the bots.
How to perform log file analysis
In this article, we took a sample raw log file from apache and then manually converted into a CSV file and then loaded into a MySQL table for analysis.
Once the log data is in the database, we can perform very good analysis and here are the different insights/dashboards that we generated using log data.
Analyzing bot data directly reveals the "Crawl Budget" insight
- Total Crawl Events : Number of total crawl events by each bot
- Crawl Bandwidth : Number of bytes consumed by each bot
- Unique URL Crawl : Unique URLs visited by each bot
The main pivot table shows the above metrics and upon clicking on any colored cell, will reveal details on the right hand side about each URL that is crawled. At the bottom of the page you will see the bot frequency on how frequency and on which dates it arrived on your site.
One of the crucial information revealed during the analysis of log files, is about the fake bots or programs that scrape content from your website. We even found attempts to access pages that do not exists on the website. This suggests that some automated program is trying to fish for security loop holes or a backdoor entry.
Next we will look into page metrics, browser analytics and several other dashboards