Neural Modular Control for Embodied Question Answering
Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

TL;DR
This paper introduces a hierarchical, modular policy framework for embodied question answering that combines imitation and reinforcement learning, significantly improving navigation and answering accuracy in complex indoor environments.
Contribution
The paper proposes a novel hierarchical policy architecture with semantic subgoals, enhancing sample efficiency and adaptability for embodied question answering tasks.
Findings
Outperforms prior methods on the EQA benchmark
Improves navigation accuracy in realistic indoor environments
Enhances question answering performance
Abstract
We present a modular approach for learning policies for navigation over long planning horizons from language input. Our hierarchical policy operates at multiple timescales, where the higher-level master policy proposes subgoals to be executed by specialized sub-policies. Our choice of subgoals is compositional and semantic, i.e. they can be sequentially combined in arbitrary orderings, and assume human-interpretable descriptions (e.g. 'exit room', 'find kitchen', 'find refrigerator', etc.). We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning. Independent reinforcement learning at each level of hierarchy enables sub-policies to adapt to consequences of their actions and recover from errors. Subsequent joint hierarchical training enables the master policy to adapt to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
