Reddit restricts Wayback Machine access amid AI scraping concerns

August 12, 2025 at 2:42:53 AM

TL;DR Reddit is blocking the Internet Archive's Wayback Machine from indexing most of its content after AI companies scraped data from it. The Wayback Machine will now only archive Reddit's homepage, limiting access to posts, comments, and profiles to protect user privacy and enforce platform policies. Reddit has restricted scraper tools, requires payment for data access, has deals with Google and OpenAI, and sued Anthropic for unauthorized scraping.

Reddit restricts Wayback Machine access amid AI scraping concerns

Reddit is blocking the Internet Archive’s Wayback Machine from indexing most of its content after discovering that AI companies scraped Reddit data from the archive. The Wayback Machine will now only be able to archive Reddit’s homepage, preventing access to post detail pages, comments, and profiles. This restriction aims to protect user privacy and comply with platform policies, especially regarding deleted content.

Background and Reasoning

Reddit’s spokesperson Tim Rathschmidt explained that AI companies violated platform policies by scraping data from the Wayback Machine, prompting Reddit to limit the archive’s access. Reddit contacted the Internet Archive beforehand to inform them of these upcoming restrictions. The Internet Archive’s mission is to preserve digital content, but Reddit believes some of its data should not be archived in this way until better protections are in place.

Reddit’s Approach to Data Access and AI

Reddit has a history of restricting access to its data to prevent abuse by AI companies. It has made deals with Google and OpenAI to provide data legally but blocks major search engines from crawling its data without payment. The company’s 2023 API changes, which led to third-party app shutdowns and protests, were also motivated by concerns over AI training misuse. Additionally, Reddit sued Anthropic for continuing to scrape data despite assurances to stop.

Internet Archive’s Response

Mark Graham, director of the Wayback Machine, stated that the Internet Archive maintains a longstanding relationship with Reddit and continues discussions regarding these issues.