Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions
Shoubin Chen, Zehao Wu, Kai Zhang, Chunyu Li, Baiyang Zhang, Fei Ma,, Fei Richard Yu, Qingquan Li

TL;DR
This paper reviews the development, datasets, challenges, and future directions of embodied multimodal large models, emphasizing their role in perception, cognition, and action in complex environments.
Contribution
It provides a comprehensive analysis of EMLMs, including architectures, datasets, challenges, and future research directions, highlighting gaps and opportunities for advancement.
Findings
EMLM development has evolved with focus on perception, navigation, and interaction.
Diverse, high-quality datasets are crucial for effective EMLM training.
Key challenges include scalability, generalization, and real-time decision-making.
Abstract
Embodied multimodal large models (EMLMs) have gained significant attention in recent years due to their potential to bridge the gap between perception, cognition, and action in complex, real-world environments. This comprehensive review explores the development of such models, including Large Language Models (LLMs), Large Vision Models (LVMs), and other models, while also examining other emerging architectures. We discuss the evolution of EMLMs, with a focus on embodied perception, navigation, interaction, and simulation. Furthermore, the review provides a detailed analysis of the datasets used for training and evaluating these models, highlighting the importance of diverse, high-quality data for effective learning. The paper also identifies key challenges faced by EMLMs, including issues of scalability, generalization, and real-time decision-making. Finally, we outline future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Focus
