VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation
Jiaming Chen, Yiyu Jiang, Aoshen Huang, Yang Li, Wei Pan

TL;DR
This paper introduces VLM-SFD, a novel framework combining Siamese flow diffusion and vision-language models to enable dual-arm robots to adaptively learn and generalize complex manipulation tasks from minimal demonstrations.
Contribution
The paper presents a new VLM-assisted Siamese flow diffusion framework with a dual-encoder-decoder architecture and dynamic task assignment for improved dual-arm manipulation.
Findings
Demonstrates strong generalization across diverse tasks
Achieves high efficiency with minimal demonstrations
Effectively integrates vision-language models for adaptive control
Abstract
Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two objects such as assembly, tool use, and bimanual grasping. To address these challenges, we introduce a novel VLM-Assisted Siamese Flow Diffusion (VLM-SFD) framework for efficient imitation learning in dual-arm cooperative manipulation. The proposed VLM-SFD framework exhibits outstanding adaptability, significantly enhancing the ability to rapidly adapt and generalize to diverse real-world tasks from only a minimal number of human demonstrations. Specifically, we propose a Siamese Flow Diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeleoperation and Haptic Systems · Robot Manipulation and Learning
