An internal Google database obtained by 404 Media shows Google recording children's voices, saving license plates from Street View, and many other self-reported incidents, large and small.
Google has accidentally collected children’s voice data, leaked the trips and home addresses of carpool users, and made YouTube recommendations based on users’ deleted watch history, among thousands of other employee-reported privacy incidents, according to a copy of an internal Google database which tracks six years worth of potential privacy and security issues obtained by 404 Media.
Each incident on its own might affect only a small number of people or get fixed quickly. But together, these incidents show how one of the world's biggest and most important companies handles, and often mishandles, a huge amount of personal and sensitive data about people's lives.
The data obtained by 404 Media includes privacy and security issues that Google’s own employees reported internally. These include issues with Google’s products or data collection practices; vulnerabilities in third-party vendors that Google uses; or mistakes made by Google staff, contractors, or others that have impacted Google systems or data. The incidents include everything from a single errant email containing some PII to substantial data leaks to impending raids on Google offices.
Incident Examples
License Plates in Street View
In one 2016 case, a Google employee reported that Google Street View’s systems were transcribing and storing license plate numbers from photos. They explained that Google uses an algorithm to detect text in Street View imagery.
Exposure of Email Addresses
Another incident involved the public exposure of more than one million users’ email addresses from Socratic.org, a company that Google acquired. The data was viewable in the page source of the company’s website, the report says. Geolocation information and IP addresses of users were also suspected to be available. Those impacted included children.
Children’s Speech Data
Google speech service logged all audio, including an estimated 1,000 children’s speech data, for around an hour. “Estimated 1K child speech utterances were collected. The team deleted all logged speech data from the affected period,” the report read.
Government Data Leak
In another incident, a customer of Google’s cloud product which is for government clients who need to protect sensitive data, was inadvertently transitioned to a consumer-level product. “As a result of an accidental SKU migration to G Suite for Business, US data location is no longer guaranteed for this customer,” the report says.
Google's Response
Google told 404 Media in a statement:
At Google employees can quickly flag potential product issues for review by the relevant teams. When an employee submits the flag they suggest the priority level to the reviewer. The reports obtained by 404 are from over six years ago and are examples of these flags—everyone was reviewed and resolved at that time. In some cases, these employee flags turned out not to be issued at all or were issues that employees found in third-party services.
404 Media obtained the large dataset from an anonymous tipster who did not provide their real name or identity. 404 Media then verified the integrity of the dataset; Google also confirmed aspects of its contents.