LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer
Lihan Zha, Asher J. Hancock, Mingtong Zhang, Tenny Yin, Yixuan Huang, Dhruv Shah, Allen Z. Ren, Anirudha Majumdar

TL;DR
LAP introduces a natural language-based pre-training method for vision-language-action models, enabling zero-shot transfer to new robot embodiments and improving manipulation success rates without embodiment-specific fine-tuning.
Contribution
The paper presents LAP, a novel approach that aligns robot actions with language, allowing zero-shot transfer and outperforming prior models in unseen robot tasks.
Findings
LAP-3B achieves over 50% zero-shot success on new robots.
LAP outperforms previous VLAs by roughly 2x in success rate.
LAP enables efficient adaptation and unified action prediction and VQA.
Abstract
A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, existing Vision-Language-Action models (VLAs) remain tightly coupled to their training embodiments and typically require costly fine-tuning. We introduce Language-Action Pre-training (LAP), a simple recipe that represents low-level robot actions directly in natural language, aligning action supervision with the pre-trained vision-language model's input-output distribution. LAP requires no learned tokenizer, no costly annotation, and no embodiment-specific architectural design. Based on LAP, we present LAP-3B, which to the best of our knowledge is the first VLA to achieve substantial zero-shot transfer to previously unseen robot embodiments without any embodiment-specific fine-tuning. Across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
