Measuring publication relatedness using controlled vocabularies
Emil Dolmer Alnor

TL;DR
This paper reviews and benchmarks controlled-vocabulary-based measures of publication relatedness, introducing a new measure and comparing it to existing ones using genomics data to inform research applications.
Contribution
It develops a new relatedness measure based on controlled vocabularies and benchmarks it against existing measures using TREC Genomics data.
Findings
The new measure and Ahlgren et al.'s measure have different strengths.
Benchmark results vary depending on research context.
Controlled vocabularies can effectively measure publication relatedness.
Abstract
Measuring the relatedness between scientific publications has important applications in many areas of bibliometrics and science policy. Controlled vocabularies provide a promising basis for measuring relatedness because they address issues that arise when using citation or textual similarity to measure relatedness. While several controlled-vocabulary-based relatedness measures have been developed, there exists no comprehensive and direct test of their accuracy and suitability for different types of research questions. This paper reviews existing measures, develops a new measure, and benchmarks the measures using TREC Genomics data as a ground truth of topics. The benchmark test show that the new measure and the measure proposed by Ahlgren et al. (2020) have differing strengths and weaknesses. These results inform a discussion of which method to choose when studying interdisciplinarity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsscientometrics and bibliometrics research · Computational and Text Analysis Methods · Biomedical Text Mining and Ontologies
