ULN: Towards Underspecified Vision-and-Language Navigation

Weixi Feng; Tsu-Jui Fu; Yujie Lu; William Yang Wang

arXiv:2210.10020·cs.CV·October 19, 2022

ULN: Towards Underspecified Vision-and-Language Navigation

Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Underspecified Vision-and-Language Navigation (ULN), a more realistic setting for VLN that uses multi-level instructions, and proposes a new framework that improves robustness and success rates over existing models.

Contribution

The paper defines ULN as a new, more practical VLN setting and develops a novel framework with GSS and E2E modules to handle multi-level underspecified instructions.

Findings

01

Existing VLN models are brittle to multi-level underspecification.

02

Proposed framework outperforms baselines by ~10% success rate.

03

Framework demonstrates increased robustness in ULN setting.

Abstract

Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions. Despite the significant performance improvement, the wide use of fine-grained instructions fails to characterize more practical linguistic variations in reality. To fill in this gap, we introduce a new setting, namely Underspecified vision-and-Language Navigation (ULN), and associated evaluation datasets. ULN evaluates agents using multi-level underspecified instructions instead of purely fine-grained or coarse-grained, which is a more realistic and general setting. As a primary step toward ULN, we propose a VLN framework that consists of a classification module, a navigation agent, and an Exploitation-to-Exploration (E2E) module. Specifically, we propose to learn Granularity Specific Sub-networks (GSS) for the agent to ground multi-level instructions with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weixi-feng/uln
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling