Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications
Zhanshuo Ye, Yiming Hou, Rui Pan, Tianchen Gao, Hansheng Wang

TL;DR
This study investigates whether large language models can predict highly cited statistical papers using early textual information, demonstrating stable performance and revealing topic patterns associated with high impact.
Contribution
The paper introduces a text-centered framework leveraging LLMs and structured prompts to predict long-term citation impact from early publication data in the field of statistics.
Findings
LLMs achieve stable, competitive prediction performance.
Predicted highly cited papers focus on topics like causal inference and deep learning.
The approach generalizes well over different publication periods.
Abstract
Predicting highly-cited papers is a long-standing challenge due to the complex interactions of research content, scholarly communities, and temporal dynamics. Recent advances in large language models (LLMs) raise the question of whether early-stage textual information can provide useful signals of long-term scientific impact. Focusing on statistical publications, we propose a flexible, text-centered framework that leverages LLMs and structured prompt design to predict highly cited papers. Specifically, we utilize information available at the time of publication, including titles, abstracts, keywords, and limited bibliographic metadata. Using a large corpus of statistical papers, we evaluate predictive performance across multiple publication periods and alternative definitions of highly cited papers. The proposed approach achieves stable and competitive performance relative to existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsscientometrics and bibliometrics research · Academic Publishing and Open Access · Computational and Text Analysis Methods
