LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

Lihan Zha; Asher J. Hancock; Mingtong Zhang; Tenny Yin; Yixuan Huang; Dhruv Shah; Allen Z. Ren; Anirudha Majumdar

arXiv:2602.10556·cs.RO·February 17, 2026

LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

Lihan Zha, Asher J. Hancock, Mingtong Zhang, Tenny Yin, Yixuan Huang, Dhruv Shah, Allen Z. Ren, Anirudha Majumdar

PDF

Open Access 2 Models

TL;DR

LAP introduces a natural language-based pre-training method for vision-language-action models, enabling zero-shot transfer to new robot embodiments and improving manipulation success rates without embodiment-specific fine-tuning.

Contribution

The paper presents LAP, a novel approach that aligns robot actions with language, allowing zero-shot transfer and outperforming prior models in unseen robot tasks.

Findings

01

LAP-3B achieves over 50% zero-shot success on new robots.

02

LAP outperforms previous VLAs by roughly 2x in success rate.

03

LAP enables efficient adaptation and unified action prediction and VQA.

Abstract

A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, existing Vision-Language-Action models (VLAs) remain tightly coupled to their training embodiments and typically require costly fine-tuning. We introduce Language-Action Pre-training (LAP), a simple recipe that represents low-level robot actions directly in natural language, aligning action supervision with the pre-trained vision-language model's input-output distribution. LAP requires no learned tokenizer, no costly annotation, and no embodiment-specific architectural design. Based on LAP, we present LAP-3B, which to the best of our knowledge is the first VLA to achieve substantial zero-shot transfer to previously unseen robot embodiments without any embodiment-specific fine-tuning. Across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI