In a recent episode of Google's SEO Made Easy series, Search Advocate Martin Splitt provided crucial insights into troubleshooting website crawling issues. The presentation focused on understanding how Google Search interacts with websites during the crawling process and how to identify potential problems.
Browser Access Doesn't Guarantee Googlebot Access
Splitt emphasized that just because a page is accessible through a browser doesn't necessarily mean Googlebot can crawl it. Several factors can prevent Googlebot from accessing URLs:
- Robots.txt restrictions
- Firewalls or bot protection systems
- Networking or routing issues between Google's data centers and web servers
Tools for Verifying Googlebot Access
To properly verify if Googlebot can access your pages, Splitt recommends using:
- The URL Inspection Tool in Google Search Console
- The Rich Results Test
These tools show the rendered HTML of pages and help confirm if Googlebot can properly access the content.
Monitoring Server Responses Through Crawl Stats
The Crawl Stats report provides valuable insights into how servers respond to crawl requests. Website owners should monitor for:
- High numbers of 500 responses
- Fetch errors
- Timeouts
- DNS problems
While transient errors may resolve automatically, frequent occurrences or sudden spikes warrant investigation, particularly for larger sites with millions of pages.
Advanced Troubleshooting Using Web Server Logs
For more detailed analysis, Splitt suggests examining web server logs, though this may require assistance from hosting providers or development teams. Server logs reveal important patterns about:
- Request timing and frequency
- Server response patterns
- Overall crawling behavior
Importantly, Splitt cautioned that not all requests claiming to be from Googlebot are legitimate, as some third-party scrapers may impersonate Googlebot's signature.