Map-based Modular Approach for Zero-shot Embodied Question Answering
Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki, Kawanabe

TL;DR
This paper introduces a map-based modular approach for embodied question answering that enables real-world robots to explore, map, and answer diverse natural language questions in unknown environments, bridging the gap between simulation and real-world application.
Contribution
The paper proposes a novel map-based modular framework leveraging foundation models for zero-shot embodied question answering in real-world environments.
Findings
Effective in real-world navigation and mapping
Robust question answering in unseen environments
Validated through extensive virtual and real-world experiments
Abstract
Embodied Question Answering (EQA) serves as a benchmark task to evaluate the capability of robots to navigate within novel environments and identify objects in response to human queries. However, existing EQA methods often rely on simulated environments and operate with limited vocabularies. This paper presents a map-based modular approach to EQA, enabling real-world robots to explore and map unknown environments. By leveraging foundation models, our method facilitates answering a diverse range of questions using natural language. We conducted extensive experiments in both virtual and real-world settings, demonstrating the robustness of our approach in navigating and comprehending queries within unknown environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Topic Modeling · Speech and dialogue systems
