Bluesky’s open API allows third-parties to scrape user data for AI training. Although Bluesky itself isn't using user content for AI training, others can access and use this data. A report by 404 Media revealed that a machine learning librarian at Hugging Face extracted 1 million public posts from Bluesky via its Firehose API for research purposes. This dataset was later removed due to controversy, highlighting that public posts on Bluesky are accessible to anyone.
Bluesky is exploring ways to let users communicate their consent preferences externally, but it cannot enforce these preferences outside its systems. The company stated that respecting these settings is up to external developers. Bluesky is in discussions with engineers and lawyers and plans to provide updates soon.
As Bluesky gains popularity, it faces the same scrutiny as other major social platforms.