Google's Gemini uses information from publicly accessible sources and Gemini Apps information to improve and develop its products, services, and machine learning technologies. However, the specific sources and Google's approach to "publicly accessible" remain unclear.
The company does not provide details on what is included or excluded or the precautions taken to protect personally identifiable information. The lack of transparency in tech companies, especially concerning the training of their AI systems, raises questions.
These companies often focus on using user input data, sometimes allowing users to use the system anonymously and advising users not to input confidential information. However, they usually hide details about the training dataset, including data sources and the detailed composition of the data used to train their AI systems.
This lack of transparency is not unique to Google. For instance, OpenAI's Mira Murati could not specify Sora's training data sources in an interview. The author suggests a mandatory information sheet detailing the training dataset's sources, compositions, possible biases, etc., accessible from the product.
Luiza argues for stronger transparency obligations on AI companies to overcome this phase of the "AI economy".