OpenAI is developing a tool called Media Manager, set to be released in 2025, which will allow creators to control how their works are used in machine learning research and training. The tool aims to identify copyrighted text, images, audio, and video across multiple sources and reflect creator preferences. It is part of OpenAI's effort to position itself as an ethical actor in the AI industry.
The need for Media Manager arises from the limitations of existing protections against AI data scraping. Currently, creators can add a string of code to the robots.txt file on their websites to prevent scraping. However, this solution is insufficient for creators who post work on platforms they don't control, or who wish to exempt only certain works from AI data scraping. OpenAI's Media Manager will offer more granular control and optionality.
The development of Media Manager comes in response to criticism and legal action against AI companies, including OpenAI, for scraping the web for data without express permission, consent, or compensation from creators. OpenAI has defended its practices by pointing to the long-standing acceptance of web crawling and scraping by many companies.
Despite ongoing legal issues, OpenAI aims to present itself as a cooperative and ethical entity. However, some creators may view this move as too little, too late, as their works have already been used to train AI models. OpenAI has stated that it does not preserve copies of scraped data, but rather uses it to generate new content and ideas.
The Media Manager tool could potentially offer a more efficient and user-friendly way to block AI training than existing options. However, it remains unclear whether creators will trust the tool, and whether it will be able to block training by rival models.