Survey of Vision-Language-Action Models for Embodied Manipulation

Haoran Li; Yuhui Chen; Wenbo Cui; Weiheng Liu; Kai Liu; Mingcai Zhou; Zhengtao Zhang; Dongbin Zhao

arXiv:2508.15201·cs.RO·November 13, 2025

Survey of Vision-Language-Action Models for Embodied Manipulation

Haoran Li, Yuhui Chen, Wenbo Cui, Weiheng Liu, Kai Liu, Mingcai Zhou, Zhengtao Zhang, Dongbin Zhao

PDF

Open Access

TL;DR

This survey reviews Vision-Language-Action models in embodied manipulation, highlighting their development, current research, challenges, and future directions to improve robotic control and interaction capabilities.

Contribution

It provides a comprehensive overview of VLA architectures, analysis of research across five key dimensions, and discusses challenges and future research avenues in embodied AI.

Findings

01

Chronicles the development of VLA architectures.

02

Analyzes current research across five key dimensions.

03

Identifies key challenges and future directions.

Abstract

Embodied intelligence systems, which enhance agent capabilities through continuous environment interactions, have garnered significant attention from both academia and industry. Vision-Language-Action models, inspired by advancements in large foundation models, serve as universal robotic control frameworks that substantially improve agent-environment interaction capabilities in embodied intelligence systems. This expansion has broadened application scenarios for embodied AI robots. This survey comprehensively reviews VLA models for embodied manipulation. Firstly, it chronicles the developmental trajectory of VLA architectures. Subsequently, we conduct a detailed analysis of current research across 5 critical dimensions: VLA model structures, training datasets, pre-training methods, post-training methods, and model evaluation. Finally, we synthesize key challenges in VLA development and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI