More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning

Mohammadamin Shafiei; Hamidreza Saffari; Nafise Sadat Moosavi

arXiv:2506.03923·cs.CL·June 5, 2025

More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning

Mohammadamin Shafiei, Hamidreza Saffari, Nafise Sadat Moosavi

PDF

Open Access

TL;DR

This paper uncovers a directional bias in LLM reasoning caused by linguistic framing in comparative questions, introduces MathComp as a benchmark to study this bias, and explores how prompt formats and social context influence model predictions.

Contribution

It introduces MathComp, a benchmark for analyzing framing bias in LLMs, and systematically studies how prompt phrasing and social cues affect reasoning accuracy and bias.

Findings

01

Models exhibit systematic bias toward framing terms like 'more' or 'less'.

02

Chain-of-thought prompting can reduce but not eliminate framing bias.

03

Including demographic terms amplifies directional drift in model predictions.

Abstract

Large language models (LLMs) are known to be sensitive to input phrasing, but the mechanisms by which semantic cues shape reasoning remain poorly understood. We investigate this phenomenon in the context of comparative math problems with objective ground truth, revealing a consistent and directional framing bias: logically equivalent questions containing the words ``more'', ``less'', or ``equal'' systematically steer predictions in the direction of the framing term. To study this effect, we introduce MathComp, a controlled benchmark of 300 comparison scenarios, each evaluated under 14 prompt variants across three LLM families. We find that model errors frequently reflect linguistic steering, systematic shifts toward the comparative term present in the prompt. Chain-of-thought prompting reduces these biases, but its effectiveness varies: free-form reasoning is more robust, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods