Inside Googlebot demystifying crawling fetching and the bytes we process

April 02, 2026 at 3:01:23 AM

TL;DR Googlebot is part of a centralized crawling platform for various Google products, not a single crawler. It fetches up to 2MB per URL (64MB for PDFs), including headers, ignoring bytes beyond this limit. Linked resources are fetched separately with their own limits. The Web Rendering Service processes fetched bytes, running JavaScript and CSS but works statelessly. To optimize, keep HTML lean, place key elements early, and monitor server response times.

Inside Googlebot demystifying crawling fetching and the bytes we process

Googlebot is not a single crawler but part of a centralized crawling platform used by various Google services like Google Search, Google Shopping, and AdSense. The name "Googlebot" remains from earlier times when Google had only one crawler. Each client using this platform sets its own fetch parameters, including user agent strings and byte limits for fetching URLs.

Byte Limits and Fetching Behavior

Googlebot fetches up to 2MB per URL (excluding PDFs, which have a 64MB limit). Other crawlers have different limits, with a default of 15MB for those without specific settings. When a page exceeds 2MB, Googlebot fetches only the first 2MB, including HTTP headers, and ignores any remaining bytes. This partial fetch is treated as the complete file for indexing and rendering purposes. Resources referenced within the HTML (except media and fonts) are fetched separately, each with their own byte limits.

Implications for Web Content

Most web pages are well below the 2MB limit, but pages with large inline base64 images, extensive inline CSS/JavaScript, or large menus risk pushing critical content beyond the cutoff. Content beyond the 2MB limit is not fetched, rendered, or indexed, effectively making it invisible to Googlebot.

Rendering Process

After fetching, the Web Rendering Service (WRS) processes the retrieved bytes by executing JavaScript and CSS to understand the page’s final visual and textual state. WRS does not fetch images or videos and applies the 2MB limit per resource. It operates statelessly, clearing local storage and session data between requests, which can affect how dynamic JavaScript elements are interpreted.

Best Practices for Webmasters

Keep HTML lean: Move heavy CSS and JavaScript to external files, as these are fetched separately.
Order critical elements early: Place meta tags, <title>, <link>, canonicals, and essential structured data near the top of the HTML to avoid being cut off.
Monitor server performance: Slow server responses cause fetchers to back off, reducing crawl frequency.

Additional Notes

The 2MB limit is not fixed and may evolve as web content changes. Crawling is a complex, scaled process involving byte exchanges, and understanding these limits helps ensure important content is accessible to Googlebot.

The information is based on insights shared in episode 105 of the Search Off the Record podcast, posted by Gary.