LLM-Metrics: Measuring Research Impact Through Large Language Model Memory
Si Shen, Wenhua Zhao, Danhao Zhu

TL;DR
This paper introduces LLM-Metrics, a novel research impact measure based on large language models' memory, which correlates with citation counts and offers a real-time, bias-resistant alternative for assessing research influence.
Contribution
The paper proposes a new impact metric derived from LLMs' parametric memory, validated through experiments across multiple models and disciplines, demonstrating its potential as a citation-independent assessment tool.
Findings
Positive correlation between LLM-Metrics and citation counts (rho = 0.1495).
Stronger predictive signal for recent papers with low citation counts.
Smaller models like Llama-3.2-3B outperform larger models in predictive power.
Abstract
Citation counts remain the dominant metric for assessing research impact, yet they suffer from well-documented limitations: temporal lag, disciplinary bias, and Matthew effects. Here we propose LLM-Metrics, a research-impact assessment metric derived from the parametric memory of large language models (LLMs). The central hypothesis is that high-impact papers receive greater exposure in the academic community, that this exposure enters LLM training data in textual form, and that models consequently form stronger parametric memory of these papers. We designed four types of multiple-choice probes, covering title recognition, author recognition, method recognition, and venue recognition, and evaluated 549 computer science papers published in 2023-2024 across 17 LLMs spanning 0.5B to 72B parameters from six vendors. Of the 17 models, 15 produced positive predictions, 9 of which were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
