Measuring Research Interest Similarity with Transition Probabilities
Attila Varga, Sadamori Kojaku, Filipi Nascimento Silva

TL;DR
This paper proposes a novel citation-based similarity measure using transition probabilities in citation networks, outperforming existing methods in classifying papers and predicting co-authorships, with an accompanying Python package.
Contribution
It introduces a new similarity measure based on random walk transition probabilities that is continuous, symmetric, and applicable to any citation network.
Findings
TP measure outperforms PageRank and Node2vec in classification tasks
Proposed metrics effectively predict future co-authors across scales
The approach can estimate individual researchers' interest similarity
Abstract
We introduce a family of paper and author similarity measures based on the concept that papers are more similar if they are more likely to be retrieved during a literature search following backward and forward citations. Since this browsing process resembles a walk in a citation network, we operationalize the concept using the transition probability (TP) of random walkers. The proposed measures are continuous, symmetric, and can be implemented on any citation network. We conduct validation tests of the TP concept and other extant alternatives to gauge which metric can classify papers and predict future co-authors most consistently across different scales of analysis (co-authorships, journals, and disciplines). Our results show that the proposed basic TP measure outperforms alternative metrics such as personalized PageRank and the Node2vec machine-learning technique in classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · scientometrics and bibliometrics research · Biomedical Text Mining and Ontologies
