Prompt perturbation and fraction facilitation sometimes strengthen Large Language Model scores

Mike Thelwall

arXiv:2512.01330·cs.DL·December 2, 2025

Prompt perturbation and fraction facilitation sometimes strengthen Large Language Model scores

Mike Thelwall

PDF

Open Access

TL;DR

This study investigates how prompt design strategies, including perturbations and fractional scoring, can improve Large Language Models' ability to evaluate research quality, revealing model-specific sensitivities and effective averaging techniques.

Contribution

It demonstrates that prompt variations, averaging, and fractional scoring can enhance LLM scoring accuracy, providing practical strategies for prompt engineering in evaluation tasks.

Findings

01

Prompt variations improve scoring consistency

02

Averaging scores from similar prompts enhances reliability

03

Allowing fractional scores reveals model certainty levels

Abstract

Large Language Models (LLMs) can be tasked with scoring texts according to pre-defined criteria and on a defined scale, but there is no recognised optimal prompting strategy for this. This article focuses on the task of LLMs scoring journal articles for research quality on a four-point scale, testing how user prompt design can enhance this ability. Based primarily on 1.7 million Gemma3 27b queries for 2780 health and life science articles with 58 similar prompts, the results show that improvements can be obtained by (a) testing semantically equivalent prompt variations, (b) averaging scores from semantically equivalent prompts, (c) specifying that fractional scores are allowed, and possibly also (d) not drawing attention to the input being partial. Whilst (a) and (d) suggests that models can be sensitive to how a task is phrased, (b) and (c) suggest that strategies to leverage more of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods · Topic Modeling