Reddit CEO Demands Microsoft Pay for Scraping Site Data

August 01, 2024 at 3:30:56 PM

TL;DR Reddit CEO Steve Huffman insists Microsoft and others must pay to scrape the site’s data, following deals with Google and OpenAI. Huffman criticized Microsoft, Anthropic, and Perplexity for refusing to negotiate, leading to blocking their crawlers. Reddit updated its robots.txt to block non-paying crawlers, affecting visibility on Bing. Huffman argues public data isn't freeware and seeks licensing deals similar to OpenAI's SearchGPT.

Reddit CEO Demands Microsoft Pay for Scraping Site Data

Reddit CEO Steve Huffman is urging Microsoft and other companies to pay for scraping Reddit's data, following agreements with Google and OpenAI. Huffman criticized Microsoft, Anthropic, and Perplexity for refusing to negotiate, leading Reddit to block these companies from accessing its data. Reddit updated its robots.txt file in July to block unauthorized web crawlers, resulting in Reddit content being visible only in Google search results, where Reddit is compensated.

Huffman accused Microsoft of using Reddit's data to train its AI and summarizing content in Bing results without permission, and selling this data through the Bing API. He referenced Microsoft AI CEO Mustafa Suleyman's comment that public internet data is "freeware" to highlight the issue. Microsoft responded by stating they respect websites' directives not to use their content with generative AI models.

Huffman emphasized that the traditional value exchange between search engines and content providers is changing, with search, summarization, and training merging. He pointed to OpenAI's SearchGPT, which includes Reddit results due to a licensing deal, as a model for future agreements. Reddit aims to join traditional media publishers in seeking payment for their content used in generative AI.

Anthropic confirmed they have blocked Reddit from their web crawler since mid-May, respecting the robots.txt file. Microsoft and Perplexity did not provide further comments.