Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation

Jingyang He; Guangrun Li; Jieyu Zhang; Chengkai Hou; Zhengping Che; Shanghang Zhang

arXiv:2605.20811·cs.RO·May 21, 2026

Demo-JEPA: Joint-Embedding Predictive Architecture for One-shot Cross-Embodiment Imitation

Jingyang He, Guangrun Li, Jieyu Zhang, Chengkai Hou, Zhengping Che, Shanghang Zhang

PDF

TL;DR

Demo-JEPA introduces a cross-embodiment imitation framework that infers demonstrator goals from visual cues, enabling flexible, embodiment-agnostic imitation in robotics.

Contribution

It presents a novel approach that decouples demonstration intent from embodiment, allowing imitation across different morphologies without shared action spaces.

Findings

01

Matches specialized in-domain planners in experiments.

02

Generalizes to unseen tasks and embodiments.

03

Requires only visual demonstrations and own interaction experience.

Abstract

Robotic imitation learning is often treated as reproducing demonstrated actions, but actions are inherently embodiment-specific. When demonstrations come from humans or robots with different morphology, kinematics, or action spaces, this action-centric view requires shared action spaces, heuristic retargeting, or large-scale multi-embodiment co-training. We instead view demonstrations as implicit specifications of future goals: the target agent should infer what state the demonstrator is trying to realize, rather than how the demonstrator executes it. We propose Demo-JEPA, a cross-embodiment imitation framework that decouples demonstration intent from embodiment-specific execution. Built on a JEPA-based world model, Demo-JEPA translates source visual demonstrations into target-compatible future latent trajectories in a shared predictive representation space. The target agent then uses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.