Bridging the visual gap in VLN via semantically richer instructions

Joaquin Ossand\'on; Benjamin Earle; \'Alvaro Soto

arXiv:2210.15565·cs.CV·October 28, 2022

Bridging the visual gap in VLN via semantically richer instructions

Joaquin Ossand\'on, Benjamin Earle, \'Alvaro Soto

PDF

TL;DR

This paper identifies that current VLN models underutilize visual information and proposes a data augmentation method using richer, object-based instructions to improve navigation success rates in unseen environments.

Contribution

The paper introduces a novel data augmentation technique that incorporates explicit visual object information into instructions, bridging the semantic gap in VLN datasets.

Findings

01

8% increase in success rate on unseen environments

02

State-of-the-art models overfit to textual instructions without visual data

03

Enhanced instructions lead to better visual understanding in VLN models

Abstract

The Visual-and-Language Navigation (VLN) task requires understanding a textual instruction to navigate a natural indoor environment using only visual information. While this is a trivial task for most humans, it is still an open problem for AI models. In this work, we hypothesize that poor use of the visual information available is at the core of the low performance of current models. To support this hypothesis, we provide experimental evidence showing that state-of-the-art models are not severely affected when they receive just limited or even no visual data, indicating a strong overfitting to the textual instructions. To encourage a more suitable use of the visual information, we propose a new data augmentation method that fosters the inclusion of more explicit visual information in the generation of textual navigational instructions. Our main intuition is that current VLN datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.