Predicting Long-Term Citations from Short-Term Linguistic Influence
Sandeep Soni, David Bamman, Jacob Eisenstein

TL;DR
This paper introduces a novel method to quantify linguistic influence in timestamped documents, which can predict future citations of research papers better than traditional citation-based metrics.
Contribution
It proposes a new approach combining contextual embeddings and Hawkes processes to measure linguistic influence and predict future citations.
Findings
Linguistic influence scores correlate with future citation counts.
The method outperforms baseline predictors including initial citations and lexical features.
Influence measurement is effective using only two years of post-publication data.
Abstract
A standard measure of the influence of a research paper is the number of times it is cited. However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications. We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps: first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these changes into per-document influence scores by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We show that this measure of linguistic influence is predictive of citations: the estimate of linguistic influence from the two years after a paper's publication is correlated with and predictive of its citation count…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Complex Network Analysis Techniques
