VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation

Jiaming Chen; Yiyu Jiang; Aoshen Huang; Yang Li; Wei Pan

arXiv:2506.13428·cs.RO·November 24, 2025

VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation

Jiaming Chen, Yiyu Jiang, Aoshen Huang, Yang Li, Wei Pan

PDF

Open Access

TL;DR

This paper introduces VLM-SFD, a novel framework combining Siamese flow diffusion and vision-language models to enable dual-arm robots to adaptively learn and generalize complex manipulation tasks from minimal demonstrations.

Contribution

The paper presents a new VLM-assisted Siamese flow diffusion framework with a dual-encoder-decoder architecture and dynamic task assignment for improved dual-arm manipulation.

Findings

01

Demonstrates strong generalization across diverse tasks

02

Achieves high efficiency with minimal demonstrations

03

Effectively integrates vision-language models for adaptive control

Abstract

Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two objects such as assembly, tool use, and bimanual grasping. To address these challenges, we introduce a novel VLM-Assisted Siamese Flow Diffusion (VLM-SFD) framework for efficient imitation learning in dual-arm cooperative manipulation. The proposed VLM-SFD framework exhibits outstanding adaptability, significantly enhancing the ability to rapidly adapt and generalize to diverse real-world tasks from only a minimal number of human demonstrations. Specifically, we propose a Siamese Flow Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeleoperation and Haptic Systems · Robot Manipulation and Learning