About the size of Google Scholar: playing the numbers
Enrique Ordu\~na-Malea, Juan Manuel Ayll\'on, Alberto, Mart\'in-Mart\'in, Emilio Delgado L\'opez-C\'ozar

TL;DR
This paper estimates the current size of Google Scholar using four empirical methods, finding it to be around 160 million documents, while highlighting significant inconsistencies and uncertainties in the estimates.
Contribution
It introduces and applies four empirical methods to estimate Google Scholar's size, revealing the challenges and limitations in accurately measuring its scope.
Findings
Estimated size of Google Scholar is about 160 million documents.
All methods show high inconsistencies and uncertainties.
Raises questions about Google's transparency regarding the size of its index.
Abstract
The emergence of academic search engines (Google Scholar and Microsoft Academic Search essentially) has revived and increased the interest in the size of the academic web, since their aspiration is to index the entirety of current academic knowledge. The search engine functionality and human search patterns lead us to believe, sometimes, that what you see in the search engine's results page is all that really exists. And, even when this is not true, we wonder which information is missing and why. The main objective of this working paper is to calculate the size of Google Scholar at present (May 2014). To do this, we present, apply and discuss up to 4 empirical methods: Khabsa & Giles's method, an estimate based on empirical data, and estimates based on direct queries and absurd queries. The results, despite providing disparate values, place the estimated size of Google Scholar in about…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsscientometrics and bibliometrics research · Web visibility and informetrics · Web Data Mining and Analysis
