Internal documentation for Google's Content Warehouse API has leaked, revealing insights into Google's search algorithms. The leak includes details about data storage for content, links, and user interactions, but lacks specifics on scoring functions.
Key Points
Ranking Systems and Features: The documentation outlines 2,596 modules with 14,014 attributes related to various Google services like YouTube, Assistant, and web documents. These modules are part of a monolithic repository, meaning all code is stored in one place and accessible by any machine on the network.
Google's Misleading Statements:
- Domain Authority: Despite Google's claims, the documentation reveals a feature called "siteAuthority," indicating Google does measure sitewide authority.
- Clicks for Rankings: Contrary to Google's public denials, systems like NavBoost use click data to influence rankings.
- Sandbox: Documentation mentions a "hostAge" attribute used to sandbox new sites, contradicting Google's denial of a sandbox.
- Chrome Data: Despite denials, the documentation shows that Chrome data is used in ranking algorithms.
Architecture: Google's ranking system is a series of microservices rather than a single algorithm. Key systems include Trawler (crawling), Alexandria (indexing), Mustang (ranking), and SuperRoot (query processing).
Twiddlers: These are re-ranking functions that adjust search results before they are presented to users. Examples include NavBoost, QualityBoost, and RealTimeBoost.
SEO Implications:
- Panda Algorithm: Panda uses a scoring modifier based on user behavior and external links, applied at various levels (domain, subdomain, subdirectory).
- Authors: Google explicitly stores author information, indicating the importance of authorship in rankings.
- Demotions: Various demotions are applied, including for anchor mismatch, SERP dissatisfaction, and exact match domains.
- Links: Links remain important, with metrics like sourceType indicating the value of links based on where a page is indexed.
- Content: Google measures the originality of short content and counts tokens, reinforcing the importance of placing key content early.
Open Questions: The author speculates on whether the Helpful Content Update is related to "Baby Panda" and what NSR (Neural Semantic Retrieval) might mean.
Strategic Advice: The author advises creating great content, promoting it well, and continuing to experiment and test SEO strategies.
The leak validates many long-held SEO beliefs and provides a clearer picture of Google's ranking mechanisms, emphasizing the importance of quality content, user engagement, and strategic link building.
Update: Google confirms the authenticity of the leaked algorithm documents