Cloudflare has identified deceptive crawling behavior by Perplexity, an AI-powered answer engine, which uses stealth, undeclared crawlers to bypass website no-crawl directives. While Perplexity initially uses declared user agents, it switches to stealth crawlers that modify user agents and source ASNs to evade blocks and often ignore or fail to fetch robots.txt files. Due to this behavior, Cloudflare has removed Perplexity from its verified bots list and implemented managed rules to block such stealth crawling.
Testing Methodology
Cloudflare responded to customer complaints where Perplexity was blocked via robots.txt and WAF rules but still accessed content. They created new domains with robots.txt disallowing all bots and found Perplexity still provided detailed content information despite these restrictions, confirming stealth crawling.
Observed Stealth Behavior
Perplexity uses both declared user agents (e.g., Perplexity-User/1.0
) and a stealth user agent mimicking Google Chrome on macOS. The stealth crawler rotates through multiple IPs and ASNs not officially associated with Perplexity to evade detection and blocks. This activity spans tens of thousands of domains and millions of requests daily. When blocked, Perplexity resorts to other data sources, resulting in less detailed answers.
User Agent Type | User Agent String | Daily Requests |
---|---|---|
Declared | Perplexity-User/1.0 |
20-25 million |
Stealth | Chrome on macOS impersonation | 3-6 million |
Good Crawler Practices
Well-behaved crawlers should:
- Be transparent: honestly identify with unique user agents, declared IP ranges, and provide contact info.
- Be well-behaved: avoid excessive traffic, sensitive data scraping, or stealth tactics.
- Serve a clear purpose: have a defined, publicly accessible reason for crawling.
- Separate activities: use distinct bots for different tasks to allow site owners control.
- Follow rules: respect robots.txt, rate limits, and security protections.
OpenAI exemplifies these practices by respecting robots.txt, clearly identifying crawlers, and not evading blocks. Tests showed ChatGPT-User respects disallow directives and stops crawling when blocked.
Protection Measures
Cloudflare’s bot management system detects and blocks Perplexity’s stealth crawlers. Customers with existing block or challenge rules are protected. Signature matches for the stealth crawler are included in managed rules available to all customers, including free users.
Future Outlook
Since Content Independence Day, over 2.5 million websites use Cloudflare features to control AI crawler access. Cloudflare expects bot evasion tactics to evolve and continues to adapt defenses. They collaborate with global experts, including IETF, to standardize crawler behavior guidelines.
Cloudflare’s connectivity cloud offers comprehensive protection and performance solutions for corporate networks and internet applications, including DDoS mitigation and Zero Trust support.