A Rubric-Supervised Critic from Sparse Real-World Outcomes

Xingyao Wang; Valerie Chen; Heng Ji; Graham Neubig

arXiv:2603.03800·cs.AI·March 5, 2026

A Rubric-Supervised Critic from Sparse Real-World Outcomes

Xingyao Wang, Valerie Chen, Heng Ji, Graham Neubig

PDF

Open Access 1 Models

TL;DR

This paper introduces a rubric-supervised critic model trained on sparse, noisy real-world interaction data to improve coding agent evaluation, training, and inference, bridging the gap between academic benchmarks and real-world scenarios.

Contribution

It presents Critic Rubrics, a supervision framework with behavioral features derived from interaction traces, enabling effective critic training from limited and noisy data.

Findings

01

Improves best-of-N reranking performance (+15.9)

02

Enables early stopping with fewer attempts (+17.7, 83% fewer)

03

Supports data curation via critic-selected trajectories

Abstract

Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In contrast, real-world coding agents operate with humans in the loop, where success signals are typically noisy, delayed, and sparse. How can we bridge this gap? In this paper, we propose a process to learn a "critic" model from sparse and noisy interaction data, which can then be used both as a reward model for either RL-based training or inference-time scaling. Specifically, we introduce Critic Rubrics, a rubric-based supervision framework with 24 behavioral features that can be derived from human-agent interaction traces alone. Using a semi-supervised objective, we can then jointly predict these rubrics and sparse human feedback (when present). In experiments, we demonstrate that, despite being trained primarily from trace-observable rubrics and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
OpenHands/openhands-critic-4b-v1.0
model· 343 dl· ♡ 6
343 dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications