Learning a Unified Latent Space for Cross-Embodiment Robot Control
Yashuai Yan, Dongheui Lee

TL;DR
This paper introduces a scalable framework that learns a shared latent space for cross-embodiment robot control, enabling flexible motion transfer and goal-directed control across diverse humanoid robots without extensive re-tuning.
Contribution
It proposes a novel two-stage method using contrastive learning and tailored similarity metrics to unify motion across different robot morphologies, allowing direct policy deployment on new robots.
Findings
Effective motion retargeting across diverse robots
Robust goal-conditioned control in a shared latent space
Easy addition of new robots with minimal retraining
Abstract
We present a scalable framework for cross-embodiment humanoid robot control by learning a shared latent representation that unifies motion across humans and diverse humanoid platforms, including single-arm, dual-arm, and legged humanoid robots. Our method proceeds in two stages: first, we construct a decoupled latent space that captures localized motion patterns across different body parts using contrastive learning, enabling accurate and flexible motion retargeting even across robots with diverse morphologies. To enhance alignment between embodiments, we introduce tailored similarity metrics that combine joint rotation and end-effector positioning for critical segments, such as arms. Then, we train a goal-conditioned control policy directly within this latent space using only human data. Leveraging a conditional variational autoencoder, our policy learns to predict latent space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Locomotion and Control · Human Motion and Animation
