MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer

Heng Zhi; Wentao Tan; Lei Zhu; Fengling Li; Jingjing Li; Guoli Yang; Heng Tao Shen

arXiv:2602.13764·cs.RO·February 17, 2026

MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer

Heng Zhi, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen

PDF

Open Access 1 Datasets

TL;DR

MOTIF introduces a novel approach for few-shot cross-embodiment transfer in robotics by learning embodiment-agnostic action motifs, enabling efficient adaptation across different robot embodiments with minimal data.

Contribution

The paper proposes MOTIF, a method that learns shared action motifs using vector quantization and alignment techniques, facilitating effective cross-embodiment transfer with few demonstrations.

Findings

01

Outperforms baselines by 6.5% in simulation

02

Achieves 43.7% improvement in real-world transfer

03

Validates effectiveness in both simulation and real environments

Abstract

While vision-language-action (VLA) models have advanced generalist robotic learning, cross-embodiment transfer remains challenging due to kinematic heterogeneity and the high cost of collecting sufficient real-world demonstrations to support fine-tuning. Existing cross-embodiment policies typically rely on shared-private architectures, which suffer from limited capacity of private parameters and lack explicit adaptation mechanisms. To address these limitations, we introduce MOTIF for efficient few-shot cross-embodiment transfer that decouples embodiment-agnostic spatiotemporal patterns, termed action motifs, from heterogeneous action data. Specifically, MOTIF first learns unified motifs via vector quantization with progress-aware alignment and embodiment adversarial constraints to ensure temporal and cross-embodiment consistency. We then design a lightweight predictor that predicts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Crossingz/ARX5_Piper_Few_shot_Example
dataset· 330 dl
330 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI · Multimodal Machine Learning Applications