A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
Jason Armitage, Leonardo Impett, Rico Sennrich

TL;DR
This paper introduces a priority map module inspired by neuropsychology to enhance feature relevance in vision-and-language navigation, significantly improving task success rates in urban environments.
Contribution
It proposes a novel priority map mechanism integrated into a hierarchical trajectory planning framework for VLN, achieving state-of-the-art results without extensive pretraining.
Findings
Doubles the task completion rate of transformer models.
Achieves state-of-the-art performance on the Touchdown benchmark.
Effective feature localization and cross-modal alignment.
Abstract
In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based architectures without costly preprocessing and pretraining, we take inspiration from priority maps - a mechanism described in neuropsychological studies. We implement a novel priority map module and pretrain on auxiliary tasks using low-sample datasets with high-level representations of routes and environment-related references to urban features. A hierarchical process of trajectory planning - with subsequent parameterised visual boost filtering on visual inputs and prediction of corresponding textual spans - addresses the core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Gaze Tracking and Assistive Technology
