ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

Gavin Hull; Alex Bihlo

arXiv:2505.08941·cs.LG·May 15, 2025

ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

Gavin Hull, Alex Bihlo

PDF

Open Access

TL;DR

ForeCite leverages pre-trained causal language models with a linear head to accurately predict future citation rates of academic papers, significantly outperforming previous methods and demonstrating robustness across various model sizes and datasets.

Contribution

The paper introduces ForeCite, a novel framework that adapts pre-trained causal language models for citation rate prediction, achieving state-of-the-art performance.

Findings

01

Achieves a test correlation of 0.826 on biomedical papers.

02

Outperforms previous state-of-the-art by 27 points.

03

Demonstrates consistent gains across model sizes and data volumes.

Abstract

Predicting the future citation rates of academic papers is an important step toward the automation of research evaluation and the acceleration of scientific progress. We present $ForeCite$ , a simple but powerful framework to append pre-trained causal language models with a linear head for average monthly citation rate prediction. Adapting transformers for regression tasks, ForeCite achieves a test correlation of $ρ = 0.826$ on a curated dataset of 900K+ biomedical papers published between 2000 and 2024, a 27-point improvement over the previous state-of-the-art. Comprehensive scaling-law analysis reveals consistent gains across model sizes and data volumes, while temporal holdout experiments confirm practical robustness. Gradient-based saliency heatmaps suggest a potentially undue reliance on titles and abstract texts. These results establish a new state-of-the-art in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsscientometrics and bibliometrics research · Topic Modeling · Artificial Intelligence in Healthcare and Education