Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics
Peter Sj\"og{\aa}rde, Per Ahlgren

TL;DR
This study proposes a methodology to determine the optimal granularity level of publication classifications based on research topics, using synthesis papers and citation data, to improve bibliometric analyses.
Contribution
It introduces a novel, data-driven approach to identify the most meaningful topic granularity in publication classifications, validated with large-scale citation data.
Findings
The methodology effectively identifies a suitable topic granularity level.
The resulting classifications align well with research topics in case studies.
Most publications are grouped into moderately sized classes.
Abstract
The purpose of this study is to find a theoretically grounded, practically applicable and useful granularity level of an algorithmically constructed publication-level classification of research publications (ACPLC). The level addressed is the level of research topics. The methodology we propose uses synthesis papers and their reference articles to construct a baseline classification. A dataset of about 31 million publications, and their mutual citations relations, is used to obtain several ACPLCs of different granularity. Each ACPLC is compared to the baseline classification and the best performing ACPLC is identified. The results of two case studies show that the topics of the cases are closely associated with different classes of the identified ACPLC, and that these classes tend to treat only one topic. Further, the class size variation is moderate, and only a small proportion of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
