Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
Wenbo Zhang, Lijinghua Zhang, Liner Xiang, Hengrui Cai

TL;DR
This paper evaluates when reasoning improves LLM-based judgments, introduces RACER to adaptively select judgment strategies under budget constraints, and demonstrates its effectiveness in handling distribution shifts.
Contribution
It provides a controlled comparison of reasoning versus non-reasoning judges and proposes RACER, a novel adaptive routing method with theoretical guarantees.
Findings
Reasoning improves accuracy on structured tasks like math and coding.
Explicit reasoning incurs higher computational costs and may be less beneficial on simpler tasks.
RACER achieves better accuracy-cost trade-offs under distribution shift.
Abstract
Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Routing (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
