Can ChatGPT evaluate research environments? Evidence from REF2021

Kayvan Kousha; Mike Thelwall; Elizabeth Gadd

arXiv:2512.05202·cs.DL·December 8, 2025

Can ChatGPT evaluate research environments? Evidence from REF2021

Kayvan Kousha, Mike Thelwall, Elizabeth Gadd

PDF

Open Access

TL;DR

This study evaluates ChatGPT's ability to assess UK research environment statements for REF2021, finding strong correlations with expert scores but cautioning against full automation due to biases and inaccuracies.

Contribution

It demonstrates that LLMs like ChatGPT can support research environment evaluations with high correlation to experts, offering a potential tool to aid workload and consistency.

Findings

01

ChatGPT scores correlated positively with expert assessments in most UoAs.

02

ChatGPT-5 achieved the highest correlation (r=0.81) with REF scores.

03

Caution is advised due to potential biases and inaccuracies in LLM assessments.

Abstract

UK academic departments are evaluated partly on the statements that they write about the value of their research environments for the Research Excellence Framework (REF) periodic assessments. These statements mix qualitative narratives and quantitative data, typically requiring time-consuming and difficult expert judgements to assess. This article investigates whether Large Language Models (LLMs) can support the process or validate the results, using the UK REF2021 unit-level environment statements as a test case. Based on prompts mimicking the REF guidelines, ChatGPT 4o-mini scores correlated positively with expert scores in almost all 34 (field-based) Units of Assessment (UoAs). ChatGPT's scores had moderate to strong positive Spearman correlations with REF expert scores in 32 out of 34 UoAs: 14 UoAs above 0.7 and a further 13 between 0.6 and 0.7. Only two UoAs had weak or no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Radiology practices and education · Meta-analysis and systematic reviews