Google Advises Not to Worry About Blocked URLs Being Partially Indexed

September 05, 2024 at 1:43:44 PM

TL;DR John Mueller, Google's Search Advocate, explained on LinkedIn that URLs blocked by robots.txt but appearing as "Indexed, though blocked by robots.txt" in Google Search Console are not a major concern. If a page is disallowed in robots.txt, Google can't see the noindex tag, and these URLs are unlikely to appear in regular search results. He advised not to worry about this issue as it isn't harmful to SEO. The key is to avoid making these pages both crawlable and indexable.

Google Advises Not to Worry About Blocked URLs Being Partially Indexed

John Mueller, Google's Search Advocate, recently addressed a common SEO concern on LinkedIn, providing valuable insights into how Google handles URLs blocked by robots.txt.

Rick, an SEO specialist encountered an issue where bot-generated backlinks were pointing to non-existent query parameter URLs (?q=[query]). These pages, despite being blocked by robots.txt and having a "noindex" tag, were appearing as "Indexed, though blocked by robots.txt" in Google Search Console (GSC).

Google's Stance

Mueller clarified several key points:

Robots.txt vs. Noindex: If a page is disallowed in robots.txt, Google can't crawl it to see the noindex tag.
Limited Visibility: While these URLs might appear in site-specific queries, they're unlikely to show up in regular search results.
Don't Worry: Mueller advised, "I wouldn't fuss over it," indicating that this situation isn't harmful to overall SEO.
Crawling Without Indexing: Allowing crawling with a noindex tag is acceptable, as it only results in "crawled/not indexed" status in Search Console.
Critical Factor: The most important thing is to avoid making these pages both crawlable and indexable.

This advice from Mueller provides clear direction for SEO professionals dealing with similar URL indexing issues, emphasizing a pragmatic approach to managing crawl directives and indexation.