Measuring the relatedness between scientific publications using controlled vocabularies

Emil Dolmer Alnor

arXiv:2602.14755·cs.DL·February 17, 2026

Measuring the relatedness between scientific publications using controlled vocabularies

Emil Dolmer Alnor

PDF

Open Access

TL;DR

This paper compares methods for measuring relatedness between scientific publications using controlled vocabularies, finding that soft cosine outperforms traditional cosine similarity in accuracy.

Contribution

Introduces two new methods, soft cosine and maximum term similarities, for better semantic relatedness measurement using controlled vocabularies.

Findings

01

Soft cosine is the most accurate method tested.

02

Traditional cosine similarity is less accurate than the new methods.

03

Results have implications for bibliometric analyses using controlled vocabularies.

Abstract

Measuring the relatedness between scientific publications is essential in many areas of bibliometrics and science policy. Controlled vocabularies provide a promising basis for measuring relatedness and are widely used in combination with Salton's cosine similarity. The latter is problematic because it only considers exact matches between terms. This article introduces two alternative methods - soft cosine and maximum term similarities - that account for the semantic similarity between non-matching terms. The article compares the accuracy of all three methods using the assignment of publications to topics in the TREC 2006 Genomics Track and the assumption that accurate relatedness measures should assign high relatedness scores to publication pairs within the same topic and low scores to pairs from separate topics. Results show that soft cosine is the most accurate method, while the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsscientometrics and bibliometrics research · Biomedical Text Mining and Ontologies · Computational and Text Analysis Methods