Implicit and Explicit Research Quality Score Probabilities from ChatGPT

Mike Thelwall; Yunhan Yang

arXiv:2506.13525·cs.DL·June 17, 2025

Implicit and Explicit Research Quality Score Probabilities from ChatGPT

Mike Thelwall, Yunhan Yang

PDF

Open Access

TL;DR

This study evaluates how ChatGPT's internal probability estimates can be used to assess research article quality, finding that token probability-based scoring offers a cost-effective and accurate ranking method aligned with human quality judgments.

Contribution

It introduces and tests novel strategies using ChatGPT's token probabilities for research quality assessment, demonstrating improved accuracy and cost-effectiveness over explicit likelihood requests.

Findings

01

Token probability-based scores correlate better with human judgments.

02

Explicit likelihood requests decrease scoring accuracy.

03

Token probabilities provide a reliable, cheaper ranking method.

Abstract

The large language model (LLM) ChatGPT's quality scores for journal articles correlate more strongly with human judgements than some citation-based indicators in most fields. Averaging multiple ChatGPT scores improves the results, apparently leveraging its internal probability model. To leverage these probabilities, this article tests two novel strategies: requesting percentage likelihoods for scores and extracting the probabilities of alternative tokens in the responses. The probability estimates were then used to calculate weighted average scores. Both strategies were evaluated with five iterations of ChatGPT 4o-mini on 96,800 articles submitted to the UK Research Excellence Framework (REF) 2021, using departmental average REF2021 quality scores as a proxy for article quality. The data was analysed separately for each of the 34 field-based REF Units of Assessment. For the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging · Meta-analysis and systematic reviews