Narrowing the Gap between Vision and Action in Navigation

Yue Zhang; Parisa Kordjamshidi

arXiv:2408.10388·cs.CV·August 21, 2024

Narrowing the Gap between Vision and Action in Navigation

Yue Zhang, Parisa Kordjamshidi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a low-level action decoder and semantic-aware waypoint predictor to bridge the gap between visual perception and low-level control in Vision and Language Navigation, improving navigation performance.

Contribution

It proposes a joint training approach for low-level actions and enhances waypoint prediction with semantic and obstacle information, addressing key limitations of existing VLN-CE methods.

Findings

01

Improved navigation performance on benchmark datasets.

02

Better grounding of visual views to low-level controls.

03

Enhanced waypoint prediction with semantic and obstacle awareness.

Abstract

The existing methods for Vision and Language Navigation in the Continuous Environment (VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This simplifies the navigation actions into a view selection task and improves navigation performance significantly compared to direct training using low-level actions. However, the VLN-CE agents are still far from the real robots since there are gaps between their visual perception and executed actions. First, VLN-CE agents that discretize the visual environment are primarily trained with high-level view selection, which causes them to ignore crucial spatial reasoning within the low-level action movements. Second, in these models, the existing waypoint predictors neglect object semantics and their attributes related to passibility, which can be informative in indicating the feasibility of actions. To address these two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HLR/Dual-Action-VLN-CE
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTravel Writing and Literature · Historical Geography and Cartography