Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
Bingqian Lin, Yi Zhu, Xiaodan Liang, Liang Lin, Jianzhuang Liu

TL;DR
This paper introduces Actional Atomic-Concept Learning (AACL), a novel approach that maps visual observations to atomic language concepts to improve alignment and performance in vision-language navigation tasks.
Contribution
AACL is the first method to use atomic action-object concepts as an intermediate bridge, significantly enhancing interpretability and state-of-the-art results in VLN benchmarks.
Findings
Achieves new state-of-the-art on R2R, REVERIE, and R2R-Last benchmarks.
Improves interpretability of navigation decisions.
Effectively reduces semantic gap between visual and linguistic inputs.
Abstract
Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position. Most existing VLN agents directly learn to align the raw directional features and visual features trained using one-hot labels to linguistic instruction features. However, the big semantic gap among these multi-modal inputs makes the alignment difficult and therefore limits the navigation performance. In this paper, we propose Actional Atomic-Concept Learning (AACL), which maps visual observations to actional atomic concepts for facilitating the alignment. Specifically, an actional atomic concept is a natural language phrase containing an atomic action and an object, e.g., ``go up stairs''. These actional atomic concepts, which serve as the bridge between observations and instructions, can effectively mitigate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsAdapter · Contrastive Language-Image Pre-training · ALIGN
