Google’s search algorithm is perhaps the most consistent system on the Internet that dictates what sites live and die and what content looks like on the web. But exactly how Google ranks websites has long been a mystery gathered by journalists, researchers and people working in search engine optimization.
Now, an explosive leak that claims to show thousands of pages of internal documents appears to offer an unprecedented look under the hood of how Search works — and suggests that Google hasn’t been completely honest about it for years. So far, Google has not responded to multiple requests for comment on the legitimacy of the documents.
Rand Fishkin, who has worked in SEO for more than a decade, says a source shared 2,500 pages of documents with him in hopes that reporting the leak would counter the “lies” Google employees have shared about how the search algorithm works. The documents outline Google’s search API and break down what information is available to employees, according to Fishkin.
The details shared by Fishkin are dense and technical, probably more readable for developers and SEO experts than for non-specialists. The leaked content also isn’t necessarily proof that Google is using the specific data and signals it mentions for search rankings. Rather, the leak outlines what data Google collects from web pages, sites and search engines, and offers indirect hints to SEO experts about what Google appears to be interested in, as SEO expert Mike King writes in his review of the documents.
The leaked documents touch on topics such as what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some of the information in the documents appears to contradict public statements by Google officials, according to Fishkin and King.
“‘Lied’ is harsh, but it’s the only accurate word to use here,” King wrote. “While I don’t necessarily blame Google’s public representatives for protecting their own information, I do resist their efforts to actively discredit people in the worlds of marketing, technology and journalism who have presented reproducible findings.”
Google has not responded On the edge’s requests for comment about the documents, including a direct request to refute their legitimacy. Fishkin said On the edge in an email that the company did not dispute the veracity of the leak, but that an employee had asked him to change some of the language in the post about how an event was characterized.
Google’s secret search algorithm has spawned an entire industry of marketers who closely follow Google’s public guidelines and implement them for millions of companies around the world. The widespread, often annoying tactics have led to a common narrative that Google search results are getting worse and worse, cluttered with the junk that website operators feel obligated to produce in order to get their sites seen. In response to On the edgeWhen reporting on SEO-driven tactics in the past, Googlers often fall back on a familiar defense: Google’s guidelines don’t say that.
But some details in the leaked documents call into question the accuracy of Google’s public statements about how Search works.
One example cited by Fishkin and King is whether Google Chrome data is used in the rankings at all. Google representatives have repeatedly stated that they do not use Chrome data to rank pages, but Chrome is specifically mentioned in sections about how websites appear in Search. In the screenshot below, which I captured as an example, the links appearing under the main vogue.com URL may have been created in part using Chrome data, according to the docs.
Another question raised is what role, if any, EEAT plays in ranking. EEAT stands for experience, expertise, authority and trustworthiness, a Google metric used to evaluate the quality of results. Google officials have previously said that EEAT is not a ranking factor. Fishkin notes that he has not found much in the documents that mentions the EEAT by name.
However, King detailed how Google appears to collect author data from a page and has a field for whether a person on the page is the author. Some of the documents shared by King say that the field was “primarily designed and set up for news articles … but has also been populated for other content (eg research papers).” Although this does not confirm that the bylines are a clear ranking indicator, it shows that Google is at least tracking this attribute. Google officials have previously insisted that bylines are something website owners should do for readers, not Google, because it doesn’t affect rankings.
While the documents aren’t exactly a weapon, they provide a deep, unfiltered look into a heavily guarded black box system. The US government’s antitrust case against Google – which revolves around Search – has also led to the release of internal documentation offering further insight into how the company’s core product works.
Google’s general cunning about how search works has resulted in websites looking the same as SEO marketers try to outsmart Google based on hints the company offers. Fishkin also refers to posts that gullibly support Google’s public claims as truth without much further analysis.
“Historically, some of the search industry’s loudest voices and most prolific publishers have been happy to uncritically repeat Google’s public statements. They write headlines like “Google says XYZ is true” instead of “Google claims XYZ; The evidence points to the opposite,” Fishkin wrote. “Please do better. If this leak and the DOJ trial can create just one change, I hope this is it.