Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation
Haoming Xu, Lei Lei, Jie Gu, Chu Tang, Jingmin Chen, Ruiqi Wang

TL;DR
The paper introduces Move-Then-Operate, a dual-phase robotic manipulation framework that improves success rates and training efficiency by explicitly separating movement and contact phases with a learnable phase selector.
Contribution
It proposes a novel dual-expert policy architecture with automatic phase labeling, enhancing manipulation performance and data efficiency over monolithic approaches.
Findings
Achieves 68.9% success rate on RoboTwin2 benchmark.
Outperforms monolithic baseline by 24%.
Reaches peak performance in 40% fewer training steps.
Abstract
We present Move-Then-Operate, a Vision language action framework that explicitly decouples robotic manipulation into two distinct behavioral phases: coarse relocation (move) and contact-critical interaction (operate). Unlike monolithic policies that conflate these heterogeneous regimes, our architecture employs a dual-expert policy routed by a learnable phase selector, introducing a structural inductive bias that isolates phase-specific dynamics. Phase labels are automatically generated via an MLLM-based pipeline conditioned on lightweight contextual cues such as end-effector velocity and subtask decomposition to ensure alignment with human motor patterns. Evaluated on the RoboTwin2 benchmark, our method achieves an average success rate of , outperforming the monolithic baseline by . It matches or exceeds models trained on more data and reaches peak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
