Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications

Zhanshuo Ye; Yiming Hou; Rui Pan; Tianchen Gao; Hansheng Wang

arXiv:2601.13627·stat.AP·January 21, 2026

Are Large Language Models able to Predict Highly Cited Papers? Evidence from Statistical Publications

Zhanshuo Ye, Yiming Hou, Rui Pan, Tianchen Gao, Hansheng Wang

PDF

Open Access

TL;DR

This study investigates whether large language models can predict highly cited statistical papers using early textual information, demonstrating stable performance and revealing topic patterns associated with high impact.

Contribution

The paper introduces a text-centered framework leveraging LLMs and structured prompts to predict long-term citation impact from early publication data in the field of statistics.

Findings

01

LLMs achieve stable, competitive prediction performance.

02

Predicted highly cited papers focus on topics like causal inference and deep learning.

03

The approach generalizes well over different publication periods.

Abstract

Predicting highly-cited papers is a long-standing challenge due to the complex interactions of research content, scholarly communities, and temporal dynamics. Recent advances in large language models (LLMs) raise the question of whether early-stage textual information can provide useful signals of long-term scientific impact. Focusing on statistical publications, we propose a flexible, text-centered framework that leverages LLMs and structured prompt design to predict highly cited papers. Specifically, we utilize information available at the time of publication, including titles, abstracts, keywords, and limited bibliographic metadata. Using a large corpus of statistical papers, we evaluate predictive performance across multiple publication periods and alternative definitions of highly cited papers. The proposed approach achieves stable and competitive performance relative to existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsscientometrics and bibliometrics research · Academic Publishing and Open Access · Computational and Text Analysis Methods