ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang

TL;DR
This paper introduces Underspecified Vision-and-Language Navigation (ULN), a more realistic setting for VLN that uses multi-level instructions, and proposes a new framework that improves robustness and success rates over existing models.
Contribution
The paper defines ULN as a new, more practical VLN setting and develops a novel framework with GSS and E2E modules to handle multi-level underspecified instructions.
Findings
Existing VLN models are brittle to multi-level underspecification.
Proposed framework outperforms baselines by ~10% success rate.
Framework demonstrates increased robustness in ULN setting.
Abstract
Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions. Despite the significant performance improvement, the wide use of fine-grained instructions fails to characterize more practical linguistic variations in reality. To fill in this gap, we introduce a new setting, namely Underspecified vision-and-Language Navigation (ULN), and associated evaluation datasets. ULN evaluates agents using multi-level underspecified instructions instead of purely fine-grained or coarse-grained, which is a more realistic and general setting. As a primary step toward ULN, we propose a VLN framework that consists of a classification module, a navigation agent, and an Exploitation-to-Exploration (E2E) module. Specifically, we propose to learn Granularity Specific Sub-networks (GSS) for the agent to ground multi-level instructions with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
