Technical SEO

Technical SEO Tip: Robots.txt Uses Most Specific Rule in Conflicts

May 25, 2024 at 8:24:12 AM

TL;DR The robots.txt file follows the most specific rule set. If two commands conflict, the more direct one is used. For example, if you disallow crawling of /blog/ but allow /blog/shopify-speed-optimizations/, Google will crawl the latter. This also applies to user-agents; if all user-agents are disallowed but Googlebot is allowed, Googlebot can crawl. Use this to make exceptions to broad rules, often by using the "Allow" command.

Technical SEO Tip: Robots.txt Uses Most Specific Rule in Conflicts

The robots.txt file follows the most specific rule set. If two commands conflict, the more direct one is used. For example, if you disallow crawling of /blog/ but allow /blog/shopify-speed-optimizations/, Google will crawl the latter. This also applies to user agents; if all user agents are disallowed but Googlebot is allowed, Googlebot can crawl. Use this to make exceptions to broad rules, often by the "Allow" command.

In a robots.txt file, the most specific rule takes precedence when commands conflict.

Example:

Disallow: /blog/
Allow: /blog/shopify-speed-optimizations/

Result: Google will crawl /blog/shopify-speed-optimizations/ because the rule is more specific.

User-Agent Example:

All user-agents cannot crawl the blog.
Googlebot can crawl the blog.

Result: Googlebot can crawl the blog, while other user agents cannot.

Key Takeaway: Use specific rules to make exceptions to broader rules, such as using the Allow command to permit crawling of particular content within a generally disallowed section.