From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

Yibin Liu; Yaxing Lyu; Daqi Gao; Zhixuan Liang; Weiliang Tang; Shilong Mu; Xiaokang Yang; and Yao Mu

arXiv:2603.15600·cs.RO·March 17, 2026

From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

Yibin Liu, Yaxing Lyu, Daqi Gao, Zhixuan Liang, Weiliang Tang, Shilong Mu, Xiaokang Yang, and Yao Mu

PDF

Open Access 2 Models

TL;DR

This paper introduces PRIMO R1, a reinforcement learning framework that transforms video language models into active critics for robotic manipulation, significantly improving process reasoning and failure detection accuracy.

Contribution

The paper presents PRIMO R1, a novel reinforcement learning approach that enables video MLLMs to generate explicit process reasoning and improve robotic manipulation supervision.

Findings

01

50% reduction in reasoning error compared to baselines

02

State-of-the-art 67.0% accuracy on RoboFail benchmark

03

Strong zero-shot generalization to failure detection

Abstract

Accurate process supervision remains a critical challenge for long-horizon robotic manipulation. A primary bottleneck is that current video MLLMs, trained primarily under a Supervised Fine-Tuning (SFT) paradigm, function as passive "Observers" that recognize ongoing events rather than evaluating the current state relative to the final task goal. In this paper, we introduce PRIMO R1 (Process Reasoning Induced Monitoring), a 7B framework that transforms video MLLMs into active "Critics". We leverage outcome-based Reinforcement Learning to incentivize explicit Chain-of-Thought generation for progress estimation. Furthermore, our architecture constructs a structured temporal input by explicitly anchoring the video sequence between initial and current state images. Supported by the proposed PRIMO Dataset and Benchmark, extensive experiments across diverse in-domain environments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Robot Manipulation and Learning