Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases

Hui Huang; Xuanxin Wu; Muyun Yang; Yuki Arase

arXiv:2601.03630·cs.CL·May 15, 2026

Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases

Hui Huang, Xuanxin Wu, Muyun Yang, Yuki Arase

PDF

TL;DR

This paper systematically compares Large Reasoning Models (LRMs) and non-reasoning LLMs, showing LRMs excel in judgment accuracy and robustness but still suffer from biases, which can be mitigated by the proposed PlanJudge strategy.

Contribution

It is the first comprehensive study demonstrating the advantages and biases of LRMs as judges and introduces PlanJudge to reduce evaluation biases effectively.

Findings

01

LRMs outperform non-reasoning LLMs in judgment accuracy on reasoning tasks.

02

LRMs show better instruction-following and robustness against adversarial attacks.

03

PlanJudge reduces biases in LLM-based judgments without sacrificing accuracy.

Abstract

This paper presents the first systematic comparison investigating whether Large Reasoning Models (LRMs) are superior judges to non-reasoning LLMs. Our empirical analysis yields four key findings: 1) LRMs outperform non-reasoning LLMs in terms of judgment accuracy, particularly on reasoning-intensive tasks; 2) LRMs demonstrate superior evaluation instruction-following capabilities; 3) LRMs exhibit enhanced robustness against adversarial attacks targeting judgment tasks; 4) However, LRMs still exhibit strong evaluation biases. To mitigate this bias vulnerability, we propose PlanJudge, a lightweight evaluation strategy that prompts the model to generate an explicit evaluation plan before executing the judgment. Despite its simplicity, our experiments demonstrate that PlanJudge significantly mitigates biases in LLM-as-a-Judge while preserving overall judgment accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.