Local Slot Attention for Vision-and-Language Navigation

Yifeng Zhuang; Qiang Sun; Yanwei Fu; Lifeng Chen; Xiangyang Xue

arXiv:2206.08645·cs.CV·June 23, 2022

Local Slot Attention for Vision-and-Language Navigation

Yifeng Zhuang, Qiang Sun, Yanwei Fu, Lifeng Chen, Xiangyang Xue

PDF

1 Repo

TL;DR

This paper introduces a novel local slot attention mechanism for vision-and-language navigation, improving how models process visual information by focusing on object segmentation and spatially restricted attention, leading to state-of-the-art results.

Contribution

The paper proposes a new slot-attention module and local attention mask for VLN, enhancing object integrity and spatial focus in transformer-based models.

Findings

01

Achieved state-of-the-art results on the R2R dataset.

02

Improved integration of object segmentation information.

03

Reduced noise by restricting visual attention span.

Abstract

Vision-and-language navigation (VLN), a frontier study aiming to pave the way for general-purpose robots, has been a hot topic in the computer vision and natural language processing community. The VLN task requires an agent to navigate to a goal location following natural language instructions in unfamiliar environments. Recently, transformer-based models have gained significant improvements on the VLN task. Since the attention mechanism in the transformer architecture can better integrate inter- and intra-modal information of vision and language. However, there exist two problems in current transformer-based models. 1) The models process each view independently without taking the integrity of the objects into account. 2) During the self-attention operation in the visual modality, the views that are spatially distant can be inter-weaved with each other without explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

patzhuang/lsa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection