Google has reorganized its URL structure documentation to provide clearer guidance for website owners looking to optimize their sites for search crawling. The updated documentation maintains the same behavioral requirements but presents the information in a more accessible format with enhanced real-world examples.
Core Requirements for Crawlable URLs
Google emphasizes that websites must follow IETF STD 66 standards for URL structure. The search engine requires that reserved characters be percent encoded to ensure proper crawling functionality. Sites that fail to meet these criteria may experience inefficient crawling, including extremely high crawl rates or complete crawling failures.
The updated guidance specifically warns against using URL fragments to change content, as Google Search generally doesn't support this approach. Instead, developers should implement the History API when using JavaScript to modify page content.
Parameter Encoding Standards
Google recommends using standard encoding practices for URL parameters. The preferred format uses an equal sign (=) to separate key-value pairs and an ampersand (&) to add additional parameters. For multiple values within the same key, developers can use characters like commas (,) that don't conflict with IETF standards.
The documentation contrasts recommended approaches like https://example.com/category?category=dresses&sort=low-to-high&sid=789
with problematic formats that use colons and brackets for parameter separation.
Best Practices for URL Construction
Descriptive and Human-Readable URLs
Google advocates for URLs that use descriptive words rather than long ID numbers. The company recommends https://example.com/wiki/Aviation
over complex parameter-heavy URLs that are difficult for both users and search engines to understand.
Language and Character Encoding
The documentation emphasizes using the audience's language in URLs. For German audiences, Google suggests using German words like https://example.com/lebensmittel/pfefferminz
, while Japanese audiences should see Japanese characters in URLs.
UTF-8 encoding is required for non-ASCII characters. The updated guidance provides specific examples of proper encoding, such as converting https://example.com/نعناع
to https://example.com/%D9%86%D8%B9%D9%86%D8%A7%D8%B9/%D8%A8%D9%82%D8%A7%D9%84%D8%A9
.
Word Separation and Case Sensitivity
Google recommends using hyphens (-) instead of underscores (_) to separate words in URLs. This approach helps both users and search engines better identify concepts. The documentation notes that URL handling is case sensitive, treating /APPLE
and /apple
as distinct URLs.
Multi-Regional Site Considerations
For sites serving multiple regions, Google suggests URL structures that facilitate geotargeting. The company recommends using country-specific domains like https://example.de
or country-specific subdirectories such as https://example.com/de/
.
Common URL Problems and Solutions
Additive Filtering Issues
The documentation addresses problems with additive filtering systems that create unnecessarily high numbers of URLs. These systems can generate multiple views of the same content, leading to crawling inefficiencies. Google provides examples of hotel search results that multiply exponentially when filters combine.
Parameter-Related Problems
Several parameter types can cause crawling issues:
Referral parameters that track user sources can create duplicate URLs. Shopping sorting parameters that change display order without altering core content also contribute to URL proliferation. Session IDs embedded in URLs are particularly problematic, with Google recommending cookies as an alternative.
Calendar and Navigation Issues
Dynamically generated calendars can create infinite URL spaces by linking to unlimited future and past dates. Google recommends adding nofollow attributes to dynamically created future calendar pages.
Broken relative links can also create infinite spaces when servers don't respond with appropriate HTTP status codes for nonexistent pages.
Recommended Fixes
Google suggests using robots.txt files to block access to problematic URLs, particularly dynamic URLs that generate search results or create infinite spaces. Sites with faceted navigation should implement specific crawling management strategies.
The updated documentation represents Google's ongoing effort to improve developer resources without changing underlying search behavior. Website owners can reference these guidelines to ensure their URL structures support efficient crawling and indexing.