SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives

Yizhou Chen; Hang Xu; Dongjie Yu; Zeqing Zhang; Yi Ren; and Jia Pan

arXiv:2506.18825·cs.RO·June 24, 2025

SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives

Yizhou Chen, Hang Xu, Dongjie Yu, Zeqing Zhang, Yi Ren, and Jia Pan

PDF

TL;DR

SViP introduces a novel framework that combines visuomotor policies with task and motion planning, enabling generalization to new tasks and conditions using minimal demonstrations without relying on object pose estimation.

Contribution

The paper presents SViP, a method that integrates visuomotor policies into TAMP using scene graph-based decision variables, improving generalization and robustness in bimanual manipulation tasks.

Findings

01

SViP generalizes to out-of-distribution initial conditions.

02

Achieves successful task execution with only 20 demonstrations.

03

Outperforms state-of-the-art imitation learning methods.

Abstract

Imitation learning (IL), particularly when leveraging high-dimensional visual inputs for policy training, has proven intuitive and effective in complex bimanual manipulation tasks. Nonetheless, the generalization capability of visuomotor policies remains limited, especially when small demonstration datasets are available. Accumulated errors in visuomotor policies significantly hinder their ability to complete long-horizon tasks. To address these limitations, we propose SViP, a framework that seamlessly integrates visuomotor policies into task and motion planning (TAMP). SViP partitions human demonstrations into bimanual and unimanual operations using a semantic scene graph monitor. Continuous decision variables from the key scene graph are employed to train a switching condition generator. This generator produces parameterized scripted primitives that ensure reliable performance even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.