Red alert: Millions of "homeless" publications in Scopus should be resettled
Weishu Liu, Haifeng Wang

TL;DR
This paper investigates the issue of millions of records in Scopus labeled as 'country-undefined', identifies the causes, and proposes solutions to improve the accuracy of author affiliation data.
Contribution
It systematically analyzes the causes of 'homeless' records in Scopus and offers recommendations to address this data quality problem.
Findings
Four primary causes identified: incomplete addresses, variant recognition issues, misspellings, and address splitting errors.
Recommendations proposed to resettle 'homeless' records and improve data accuracy.
Highlights impact on research evaluation and bibliometric analysis.
Abstract
Scopus is increasingly regarded as a high-quality and reliable data source for research and evaluation of scientific and scholarly activity. However, a puzzling phenomenon has been discovered occasionally: millions of records with author affiliation information collected in Scopus are oddly labeled as "country-undefined" by Scopus which is rarely to be detected in its counterpart Web of Science. This huge number of "homeless" records in Scopus is unacceptable for a widely used high-quality bibliographic database. By using data from the past 124 years, this brief communication tries to probe these affiliated but country-undefined records in Scopus. Our analysis identifies four primary causes for these "homeless" records: incomplete author affiliation addresses, Scopus' inability to recognize different variants of country/territory names, misspelled country/territory names in author…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
