Ground then Navigate: Language-guided Navigation in Dynamic Scenes

Kanishk Jain; Varun Chhangani; Amogh Tiwari; K. Madhava Krishna and; Vineet Gandhi

arXiv:2209.11972·cs.CV·September 27, 2022·1 cites

Ground then Navigate: Language-guided Navigation in Dynamic Scenes

Kanishk Jain, Varun Chhangani, Amogh Tiwari, K. Madhava Krishna and, Vineet Gandhi

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a novel approach for vision-and-language navigation in outdoor autonomous driving, grounding navigable regions via segmentation masks without relying on discretized maps, enhancing interpretability and maneuverability.

Contribution

It presents a new method that explicitly grounds navigable regions in visual scenes, moving beyond node selection to continuous action spaces, and introduces the CARLA-NAV dataset for training and validation.

Findings

01

Effective segmentation-based navigation in outdoor scenes

02

Improved interpretability through visual feedback

03

Validated with extensive empirical results

Abstract

We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, which pose this task as a node selection problem, given a discrete connected graph corresponding to the environment. We do not assume the availability of such a discretised map. Our work moves towards continuity in action space, provides interpretability through visual feedback and allows VLN on commands requiring finer manoeuvres like "park between the two cars". Furthermore, we propose a novel meta-dataset CARLA-NAV to allow efficient training and validation. The dataset comprises pre-recorded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanji95/carla-nav-tool
none

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning