World Action Models: The Next Frontier in Embodied AI

Siyin Wang; Junhao Shi; Zhaoyang Fu; Xinzhe He; Feihong Liu; Chenchen Yang; Yikang Zhou; Zhaoye Fei; Jingjing Gong; Jinlan Fu; Mike Zheng Shou; Xuanjing Huang; Xipeng Qiu; Yu-Gang Jiang

arXiv:2605.12090·cs.RO·May 13, 2026

World Action Models: The Next Frontier in Embodied AI

Siyin Wang, Junhao Shi, Zhaoyang Fu, Xinzhe He, Feihong Liu, Chenchen Yang, Yikang Zhou, Zhaoye Fei, Jingjing Gong, Jinlan Fu, Mike Zheng Shou, Xuanjing Huang, Xipeng Qiu, Yu-Gang Jiang

PDF

1 Repo

TL;DR

This paper introduces World Action Models (WAMs), a unified framework combining predictive environment modeling with action generation for embodied AI, and systematically surveys the current landscape.

Contribution

It formally defines WAMs, organizes existing methods into a taxonomy, and analyzes the data ecosystems and evaluation protocols in this emerging field.

Findings

01

WAMs unify environment prediction with action generation.

02

Existing methods are categorized into Cascaded and Joint WAMs.

03

Evaluation protocols focus on visual fidelity, physical commonsense, and action plausibility.

Abstract

Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify predictive state modeling with action generation, targeting a joint distribution over future states and actions rather than actions alone. However, the literature remains fragmented across architectures, learning objectives, and application scenarios, lacking a unified conceptual framework. We formally define WAMs and disambiguate them from related concepts, and trace the foundations and early…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openmoss/Awesome-WAM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.