Cloudflare has introduced a new, free tool to prevent bots from scraping websites for data to train AI models. This tool is designed to address the issue of AI scrapers that do not respect the robots.txt
file, which traditionally tells bots which pages they can access.
Key Features and Functionality
- Bot Detection Models: Cloudflare has fine-tuned automatic bot detection models by analyzing AI bot and crawler traffic. These models can identify bots that mimic the appearance and behavior of legitimate users.
- Easy Blocking: A new "easy button" allows customers to block all AI bots with a single click. This feature is available to all customers, including those on the free tier.
- Continuous Updates: The tool will be updated over time to recognize new bot fingerprints as they are identified.
AI Bot Activity
- Popular AI Bots: The most active AI bots on Cloudflare’s network include Bytespider, Amazonbot, ClaudeBot, and GPTBot. Bytespider, operated by ByteDance, leads in request volume and is frequently blocked.
- Blocking Trends: Although AI bots accessed around 39% of the top one million Internet properties using Cloudflare, only 2.98% of these properties took measures to block or challenge those requests. Higher-ranked properties are more likely to block AI bots.
Detection and Prevention
- Spoofed User Agents: Cloudflare’s machine learning models can detect bots that use spoofed user agents to appear as legitimate browsers. These models score traffic to identify likely bot activity.
- Global Signals: Cloudflare uses global signals from its network, which sees over 57 million requests per second, to trust and flag bot fingerprints accurately.
Cloudflare’s new tool is a robust solution for website owners to protect their content from unauthorized AI scraping. By leveraging advanced detection models and providing easy-to-use blocking features, Cloudflare helps maintain a secure and fair Internet environment for content creators.