Summary
AI firms like OpenAI, Google, and Meta are struggling to find quality training data. OpenAI trained its GPT-4 model using YouTube videos, a move seen as legally questionable. Google used YouTube transcripts, while Meta considered copyrighted materials. The companies are exploring solutions like synthetic data and curriculum learning, as they may outpace new content by 2028.