Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

Jiahua Ma; Yiran Qin; Xin Wen; Yixiong Li; Yuyu Sun; Yulan Guo; Liang Lin; Ruimao Zhang

arXiv:2604.05544·cs.RO·April 8, 2026

Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

Jiahua Ma, Yiran Qin, Xin Wen, Yixiong Li, Yuyu Sun, Yulan Guo, Liang Lin, Ruimao Zhang

PDF

TL;DR

This paper introduces ReV, a closed-loop visuomotor policy framework that adaptively re-plans trajectories in real-time for robotic manipulation, enhancing robustness to unforeseen circumstances without extra data.

Contribution

ReV is a novel framework that integrates sparse referring points into visuomotor policies for dynamic, real-time trajectory adaptation during manipulation tasks.

Findings

01

ReV achieves higher success rates in simulated tasks.

02

ReV demonstrates robustness in real-world manipulation scenarios.

03

ReV operates without additional data or fine-tuning.

Abstract

This paper addresses a fundamental problem of visuomotor policy learning for robotic manipulation: how to enhance robustness in out-of-distribution execution errors or dynamically re-routing trajectories, where the model relies solely on the original expert demonstrations for training. We introduce the Referring-Aware Visuomotor Policy (ReV), a closed-loop framework that can adapt to unforeseen circumstances by instantly incorporating sparse referring points provided by a human or a high-level reasoning planner. Specifically, ReV leverages the coupled diffusion heads to preserve standard task execution patterns while seamlessly integrating sparse referring via a trajectory-steering strategy. Upon receiving a specific referring point, the global diffusion head firstly generates a sequence of globally consistent yet temporally sparse action anchors, while identifies the precise temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.