Latent Action Diffusion for Cross-Embodiment Manipulation
Erik Bauer, Elvis Nava, Robert K. Katzschmann

TL;DR
This paper introduces a diffusion policy framework in a latent action space that unifies diverse robot end-effectors, enabling effective cross-embodiment manipulation and improving multi-robot skill transfer.
Contribution
It proposes a novel latent action space learned via contrastive encoders, facilitating cross-embodiment manipulation and data sharing across different robot morphologies.
Findings
Achieved up to 25.3% improvement in manipulation success rates.
Learned semantically aligned latent spaces for various robotic hands and grippers.
Enabled multi-robot control with a single unified policy.
Abstract
End-to-end learning is emerging as a powerful paradigm for robotic manipulation, but its effectiveness is limited by data scarcity and the heterogeneity of action spaces across robot embodiments. In particular, diverse action spaces across different end-effectors create barriers for cross-embodiment learning and skill transfer. We address this challenge through diffusion policies learned in a latent action space that unifies diverse end-effector actions. We first show that we can learn a semantically aligned latent action space for anthropomorphic robotic hands, a human hand, and a parallel jaw gripper using encoders trained with a contrastive loss. Second, we show that by using our proposed latent action space for co-training on manipulation data from different end-effectors, we can utilize a single policy for multi-robot control and obtain up to 25.3% improved manipulation success…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Anomaly Detection Techniques and Applications
