SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Wonjun Jeong; Dongseok Kim; Taegkeun Whangbo

arXiv:2507.18182·cs.CL·August 5, 2025

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Wonjun Jeong, Dongseok Kim, Taegkeun Whangbo

PDF

Open Access

TL;DR

SCOPE is a novel evaluation framework that detects and mitigates position and label biases in large language models, leading to more reliable and fair assessments across multiple-choice tasks.

Contribution

It introduces a dataset-independent method to estimate and counteract position bias in LLM evaluations, improving the accuracy of model performance measurement.

Findings

01

SCOPE outperforms existing debiasing methods in benchmark tests.

02

It stabilizes model performance and clarifies confidence distributions.

03

The framework enhances fairness and reliability in LLM evaluation.

Abstract

Large Language Models (LLMs) can achieve inflated scores on multiple-choice tasks by exploiting inherent biases in option positions or labels, rather than demonstrating genuine understanding. This study introduces SCOPE, an evaluation framework designed to measure and mitigate such selection bias in a dataset-independent manner. By repeatedly invoking a null prompt that lacks semantic content, SCOPE estimates each model's unique position-bias distribution. It then redistributes the answer slot according to the inverse-bias distribution, thereby equalizing the lucky-rate, the probability of selecting the correct answer by chance. Furthermore, it prevents semantically similar distractors from being placed adjacent to the answer, thereby blocking near-miss guesses based on superficial proximity cues. Across multiple benchmark experiments, SCOPE consistently outperformed existing debiasing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques