John Mueller, Google's Search Advocate, recently addressed a common SEO concern on LinkedIn, providing valuable insights into how Google handles URLs blocked by robots.txt.
Rick, an SEO specialist encountered an issue where bot-generated backlinks were pointing to non-existent query parameter URLs (?q=[query])
. These pages, despite being blocked by robots.txt and having a "noindex" tag, were appearing as "Indexed, though blocked by robots.txt" in Google Search Console (GSC).
Google's Stance
Mueller clarified several key points:
Robots.txt vs. Noindex: If a page is disallowed in robots.txt, Google can't crawl it to see the noindex tag.
Limited Visibility: While these URLs might appear in site-specific queries, they're unlikely to show up in regular search results.
Don't Worry: Mueller advised, "I wouldn't fuss over it," indicating that this situation isn't harmful to overall SEO.
Crawling Without Indexing: Allowing crawling with a noindex tag is acceptable, as it only results in "crawled/not indexed" status in Search Console.
Critical Factor: The most important thing is to avoid making these pages both crawlable and indexable.
This advice from Mueller provides clear direction for SEO professionals dealing with similar URL indexing issues, emphasizing a pragmatic approach to managing crawl directives and indexation.