Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Yicong Hong, Zun Wang, Qi Wu, Stephen Gould

TL;DR
This paper introduces a waypoint predictor to bridge the gap between discrete and continuous environments in vision-and-language navigation, significantly improving agent performance and generalization across different navigation setups.
Contribution
The paper proposes a novel waypoint predictor trained with refined environment graphs to enable high-level action agents to operate effectively in continuous environments.
Findings
Agents with predicted waypoints outperform low-level action agents in continuous environments.
The method reduces the discrete-to-continuous gap by over 11% in SPL.
State-of-the-art results achieved on R2R-CE and RxR-CE datasets.
Abstract
Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments, training agents that cannot generalize across the two. The fundamental difference between the two setups is that discrete navigation assumes prior knowledge of the connectivity graph of the environment, so that the agent can effectively transfer the problem of navigation with low-level controls to jumping from node to node with high-level actions by grounding to an image of a navigable direction. To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments. We refine the connectivity graph of Matterport3D to fit the continuous Habitat-Matterport3D, and train the waypoints predictor with the refined graphs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
