Diffusion-Based Imaginative Coordination for Bimanual Manipulation

Huilin Xu; Jian Ding; Jiakun Xu; Ruixiang Wang; Jun Chen; Jinjie Mai; Yanwei Fu; Bernard Ghanem; Feng Xu; Mohamed Elhoseiny

arXiv:2507.11296·cs.RO·July 16, 2025

Diffusion-Based Imaginative Coordination for Bimanual Manipulation

Huilin Xu, Jian Ding, Jiakun Xu, Ruixiang Wang, Jun Chen, Jinjie Mai, Yanwei Fu, Bernard Ghanem, Feng Xu, Mohamed Elhoseiny

PDF

Open Access 1 Repo

TL;DR

This paper introduces a diffusion-based framework for bimanual manipulation in robotics, combining video and action prediction to improve coordination and success rates in complex tasks.

Contribution

It proposes a novel multi-frame latent prediction strategy and a unidirectional attention mechanism to enhance bimanual coordination efficiency.

Findings

01

Achieved 24.9% success rate increase on ALOHA benchmark.

02

Improved RoboTwin success rate by 11.1%.

03

Increased real-world task success by 32.5%.

Abstract

Bimanual manipulation is crucial in robotics, enabling complex tasks in industrial automation and household services. However, it poses significant challenges due to the high-dimensional action space and intricate coordination requirements. While video prediction has been recently studied for representation learning and control, leveraging its ability to capture rich dynamic and behavioral information, its potential for enhancing bimanual coordination remains underexplored. To bridge this gap, we propose a unified diffusion-based framework for the joint optimization of video and action prediction. Specifically, we propose a multi-frame latent prediction strategy that encodes future states in a compressed latent space, preserving task-relevant features. Furthermore, we introduce a unidirectional attention mechanism where video prediction is conditioned on the action, while action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

return-sleep/diffusion_based_imaginative_coordination
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMotor Control and Adaptation