LACE: Latent Visual Representation for Cross-Embodiment Learning

Yoo Sung Jang; Kanchana Ranasinghe; Cristina Mata; Yichi Zhang; Jorge Mendez-Mendez; Michael S. Ryoo

arXiv:2605.16743·cs.RO·May 19, 2026

LACE: Latent Visual Representation for Cross-Embodiment Learning

Yoo Sung Jang, Kanchana Ranasinghe, Cristina Mata, Yichi Zhang, Jorge Mendez-Mendez, Michael S. Ryoo

PDF

TL;DR

LACE is a framework that aligns human and robot visual representations in a shared latent space, enabling effective cross-embodiment learning and zero-shot transfer of robot policies using minimal demonstration data.

Contribution

LACE introduces a novel semantic alignment method leveraging shared body part correspondences and self-supervised learning to improve cross-embodiment robot learning.

Findings

01

Zero-shot transfer policies using LACE-DINO outperform baseline by 65%.

02

LACE improves performance in low-data and out-of-distribution environments.

03

Single robot demonstration suffices for training the model.

Abstract

Cross-embodiment learning from human demonstrations is hindered by the visual gap between human and robot embodiments. While self-supervised learning (SSL) backbones encode rich inter-class semantics of general objects, we show they fail to establish correspondence between human and robot hands. We propose LACE, a framework that aligns human and robot visual representations in the latent space of these backbones by leveraging correspondences between shared body parts across embodiments as sparse supervision. These annotations can be automatically obtained via forward kinematics, and single robot demonstration is sufficient to train the model. Our semantic alignment loss matches distributions incurred by corresponding features, lifting patch-level supervision to semantic-level alignment, while a Gram loss preserves pretrained feature quality. This alignment enables robot policies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.