RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
Gaotang Li, Bhavana Dalvi Mishra, Zifeng Wang, Jun Yan, Yanfei Chen, Chun-Liang Li, Long T. Le, Rujun Han, George Lee, Hanghang Tong, Chen-Yu Lee, Tomas Pfister

TL;DR
RubricEM introduces a rubric-guided reinforcement learning framework that decomposes policy stages, uses rubric feedback for dense credit assignment, and trains a reflection meta-policy for reusable guidance, advancing research agent capabilities.
Contribution
It proposes a novel rubric-guided RL approach with stagewise policy decomposition and reflection-based meta-policy evolution for long-form research tasks.
Findings
RubricEM-8B outperforms comparable open models on research benchmarks.
Stagewise rubric judgments provide denser semantic feedback.
Reflection meta-policy distills trajectories into reusable guidance.
Abstract
Training deep research agents, namely systems that plan, search, evaluate evidence, and synthesize long-form reports, pushes reinforcement learning beyond the regime of verifiable rewards. Their outputs lack ground-truth answers, their trajectories span many tool-augmented decisions, and standard post-training offers little mechanism for turning past attempts into reusable experience. In this work, we argue that rubrics should serve not merely as final-answer evaluators, but as the shared interface that structures policy execution, judge feedback, and agent memory. Based on this view, we introduce RubricEM, a rubric-guided reinforcement learning framework that combines stagewise policy decomposition with reflection-based meta-policy evolution. RubricEM first makes research trajectories stage-aware by conditioning planning, evidence gathering, review, and synthesis on self-generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
