Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay, Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, and Hao-Shu Fang, Shibo Zhao, Shayegan Omidshafiei, Dong-Ki Kim and, Ali-akbar Agha-mohammadi, Katia Sycara, Matthew Johnson-Roberson and, Dhruv Batra

TL;DR
This survey explores how foundation models from NLP and CV can be adapted for general-purpose robotics, addressing fundamental barriers, taxonomy of current approaches, and future challenges.
Contribution
It provides a comprehensive overview of applying foundation models to robotics and proposes a taxonomy for current and future research directions.
Findings
Foundation models show promise for general-purpose robotics.
Current barriers include generalization and robustness issues.
Future directions involve developing robotics-specific foundation models.
Abstract
Building general-purpose robots that operate seamlessly in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. However, as a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments. These systems require extensively-labeled data and task-specific models. When deployed in real-world scenarios, such systems face several generalization issues and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
