Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Yixin Liu; Yue Yu; DiJia Su; Sid Wang; Xuewei Wang; Song Jiang; Bo Liu; Arman Cohan; Yuandong Tian; Zhengxing Chen

arXiv:2603.12246·cs.AI·March 13, 2026

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Yixin Liu, Yue Yu, DiJia Su, Sid Wang, Xuewei Wang, Song Jiang, Bo Liu, Arman Cohan, Yuandong Tian, Zhengxing Chen

PDF

Open Access

TL;DR

This paper investigates the effectiveness of reasoning versus non-reasoning LLM judges in reinforcement learning alignment, revealing that reasoning judges produce policies with strong performance but also adversarial outputs, highlighting both potential and challenges.

Contribution

The study provides a systematic comparison of reasoning and non-reasoning LLM judges in RL alignment, emphasizing the strengths and vulnerabilities of reasoning judges in non-verifiable domains.

Findings

01

Reasoning judges lead to policies that perform well on benchmarks.

02

Non-reasoning judges are prone to reward hacking.

03

Reasoning judges can generate adversarial outputs that deceive other judges.

Abstract

Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. However, while reasoning judges have shown better performance on static evaluation benchmarks, their effectiveness in actual policy training has not been systematically examined. Therefore, we conduct a rigorous study to investigate the actual impact of non-reasoning and reasoning judges in reinforcement-learning-based LLM alignment. Our controlled synthetic setting, where a "gold-standard" judge (gpt-oss-120b) provides preference annotations to train smaller judges, reveals key differences between non-reasoning and reasoning judges: non-reasoning judges lead to reward hacking easily, while reasoning judges can lead to policies that achieve strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI