TL;DR
This paper introduces Interactive Question Answering (IQA), a task where an autonomous agent answers questions in dynamic visual environments by interacting with objects, and proposes a hierarchical memory network to improve performance.
Contribution
The paper presents the IQA task, a new dataset IQUAD V1, and a Hierarchical Interactive Memory Network (HIMN) that outperforms existing methods on this task.
Findings
HIMN outperforms single controller methods on IQUAD V1.
IQUAD V1 contains 75,000 questions in photo-realistic indoor scenes.
Hierarchical controllers improve interaction and reasoning in visual environments.
Abstract
We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: "Are there any apples in the fridge?" The agent must navigate around the scene, acquire visual understanding of scene elements, interact with objects (e.g. open refrigerators) and plan for a series of actions conditioned on the question. Popular reinforcement learning approaches with a single controller perform poorly on IQA owing to the large and diverse state space. We propose the Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction. To evaluate HIMN, we introduce IQUAD V1, a new dataset built upon AI2-THOR, a simulated photo-realistic environment of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMemory Network
