Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Shuanghao Bai; Wenxuan Song; Jiayi Chen; Yuheng Ji; Zhide Zhong; Jin Yang; Han Zhao; Wanqi Zhou; Zhe Li; Pengxiang Ding; Cheng Chi; Chang Xu; Xiaolong Zheng; Donglin Wang; Haoang Li; Shanghang Zhang; Badong Chen

arXiv:2512.22983·cs.RO·December 30, 2025

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Shuanghao Bai, Wenxuan Song, Jiayi Chen, Yuheng Ji, Zhide Zhong, Jin Yang, Han Zhao, Wanqi Zhou, Zhe Li, Pengxiang Ding, Cheng Chi, Chang Xu, Xiaolong Zheng, Donglin Wang, Haoang Li, Shanghang Zhang, Badong Chen

PDF

Open Access

TL;DR

This survey reviews recent advances in robotic manipulation driven by foundation models, focusing on high-level planning with multimodal reasoning and low-level control learning, highlighting challenges and future directions.

Contribution

It provides a unified framework organizing recent learning-based robotic manipulation approaches into planning and control, emphasizing multimodal reasoning and a taxonomy for control methods.

Findings

01

Extended task planning to include language, code, and 3D reasoning.

02

Organized control methods by input modeling and policy learning.

03

Identified key open challenges like scalability and safety.

Abstract

Recent advances in vision, language, and multimodal learning have substantially accelerated progress in robotic foundation models, with robot manipulation remaining a central and challenging problem. This survey examines robot manipulation from an algorithmic perspective and organizes recent learning-based approaches within a unified abstraction of high-level planning and low-level control. At the high level, we extend the classical notion of task planning to include reasoning over language, code, motion, affordances, and 3D representations, emphasizing their role in structured and long-horizon decision making. At the low level, we propose a training-paradigm-oriented taxonomy for learning-based control, organizing existing methods along input modeling, latent representation learning, and policy learning. Finally, we identify open challenges and prospective research directions related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics