Map-based Modular Approach for Zero-shot Embodied Question Answering

Koya Sakamoto; Daichi Azuma; Taiki Miyanishi; Shuhei Kurita; Motoaki; Kawanabe

arXiv:2405.16559·cs.RO·October 15, 2024

Map-based Modular Approach for Zero-shot Embodied Question Answering

Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki, Kawanabe

PDF

Open Access 1 Repo

TL;DR

This paper introduces a map-based modular approach for embodied question answering that enables real-world robots to explore, map, and answer diverse natural language questions in unknown environments, bridging the gap between simulation and real-world application.

Contribution

The paper proposes a novel map-based modular framework leveraging foundation models for zero-shot embodied question answering in real-world environments.

Findings

01

Effective in real-world navigation and mapping

02

Robust question answering in unseen environments

03

Validated through extensive virtual and real-world experiments

Abstract

Embodied Question Answering (EQA) serves as a benchmark task to evaluate the capability of robots to navigate within novel environments and identify objects in response to human queries. However, existing EQA methods often rely on simulated environments and operate with limited vocabularies. This paper presents a map-based modular approach to EQA, enabling real-world robots to explore and map unknown environments. By leveraging foundation models, our method facilitates answering a diverse range of questions using natural language. We conducted extensive experiments in both virtual and real-world settings, demonstrating the robustness of our approach in navigating and comprehending queries within unknown environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ATR-DBI/Map-EQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Topic Modeling · Speech and dialogue systems