Can we use Google Scholar to identify highly-cited documents?
Alberto Mart\'in-Mart\'in, Enrique Orduna-Malea, Anne-Wil Harzing,, Emilio Delgado L\'opez-C\'ozar

TL;DR
This study empirically evaluates Google Scholar's effectiveness in identifying highly-cited scientific documents, finding it reliable despite some language and version identification biases.
Contribution
It provides the first large-scale longitudinal analysis demonstrating Google Scholar's capability to accurately identify highly-cited papers across diverse sources.
Findings
Google Scholar's citation rank correlates strongly with actual citation counts (r= -0.67).
Google Scholar effectively identifies highly-cited documents across different years.
Language and version identification biases have minimal impact on citation ranking accuracy.
Abstract
The main objective of this paper is to empirically test whether the identification of highly-cited documents through Google Scholar is feasible and reliable. To this end, we carried out a longitudinal analysis (1950 to 2013), running a generic query (filtered only by year of publication) to minimise the effects of academic search engine optimisation. This gave us a final sample of 64,000 documents (1,000 per year). The strong correlation between a document's citations and its position in the search results (r= -0.67) led us to conclude that Google Scholar is able to identify highly-cited papers effectively. This, combined with Google Scholar's unique coverage (no restrictions on document type and source), makes the academic search engine an invaluable tool for bibliometric research relating to the identification of the most influential scientific documents. We find evidence, however,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
