Gary Illyes from Google has shed new light on a long-standing belief about robots.txt files. In a recent post, Illyes challenges the notion that a website's robots.txt file must always be located at the root domain (example.com/robots.txt).
Key points:
- Contrary to popular belief, a robots.txt file doesn't have to be located only at
example.com/robots.txt
. - Websites can use a centralized robots.txt file, even if it's hosted on a different domain, such as a CDN.
- For example, a site could have two robots.txt files: one at
https://cdn.example.com/robots.txt
and another athttps://www.example.com/robots.txt
. - Webmasters can redirect
https://www.example.com/robots.txt
tohttps://cdn.example.com/robots.txt
. - Crawlers compliant with RFC9309 will follow this redirect and use the target file as the authoritative robots.txt for
www.example.com
.
This revelation offers more flexibility for webmasters, especially those using Content Delivery Networks (CDNs). It allows for easier management of crawl rules across multiple domains or subdomains.
Illyes also pondered whether the file itself needs to be named "robots.txt," hinting at possible future developments or flexibility in the protocol.