TL;DR
This study compares how five major bibliometric databases classify publication and document types, revealing significant inconsistencies and variations in their taxonomies and curation strategies.
Contribution
It provides a comprehensive analysis of classification differences across OpenAlex, Web of Science, Scopus, PubMed, and Semantic Scholar using a large shared dataset.
Findings
Large differences in document type classification among databases
Many records lack publication types in OpenAlex but are classified as conference proceedings elsewhere
Significant variation in curation strategies across sources
Abstract
The assignment of document and publication types in scholarly databases plays an important role in bibliometrics, for example in decision-making or university rankings. However, scholarly databases apply different curation strategies and taxonomies when classifying documents which makes it difficult to compare results from different database providers. In this study, the bibliometric databases OpenAlex, Web of Science, Scopus, PubMed and Semantic Scholar are used to analyse the extent of data variation and compare different approaches to taxonomy and data curation. Using a shared corpus of 9,575,603 publications from 2012 to 2022, we found large differences in the classification of document types such as research articles and editorials in these databases. We can also show that many of the records that lack a publication type in OpenAlex are classified as conference proceedings in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
