Embodied Question Answering
Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh,, Dhruv Batra

TL;DR
This paper introduces Embodied Question Answering, a complex AI task where agents navigate 3D environments to answer questions, integrating perception, navigation, and reasoning skills.
Contribution
It defines the EmbodiedQA task, develops environments and evaluation protocols, and trains reinforcement learning agents to perform this integrated task.
Findings
Agents can navigate and gather information in 3D environments.
Reinforcement learning enables agents to answer questions based on visual exploration.
The task combines perception, navigation, and reasoning skills.
Abstract
We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). This challenging task requires a range of AI skills -- active perception, language understanding, goal-driven navigation, commonsense reasoning, and grounding of language into actions. In this work, we develop the environments, end-to-end-trained reinforcement learning agents, and evaluation protocols for EmbodiedQA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Robotics and Automated Systems
