Iterative Vision-and-Language Navigation

Jacob Krantz; Shurjo Banerjee; Wang Zhu; Jason Corso; Peter Anderson,; Stefan Lee; Jesse Thomason

arXiv:2210.03087·cs.CV·December 27, 2023

Iterative Vision-and-Language Navigation

Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson,, Stefan Lee, Jesse Thomason

PDF

Open Access

TL;DR

This paper introduces IVLN, a new paradigm for vision-and-language navigation that emphasizes persistent memory across multiple episodes, better reflecting real-world robot deployment scenarios.

Contribution

It proposes the IVLN paradigm and benchmarks, highlighting the importance of map-building agents for environment persistence in VLN tasks.

Findings

01

Map-building agents benefit from environment persistence

02

Extending implicit memory alone is insufficient for IVLN

03

New benchmarks with 400 tours in 80 scenes introduced

Abstract

We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time. Existing Vision-and-Language Navigation (VLN) benchmarks erase the agent's memory at the beginning of every episode, testing the ability to perform cold-start navigation with no prior information. However, deployed robots occupy the same environment for long periods of time. The IVLN paradigm addresses this disparity by training and evaluating VLN agents that maintain memory across tours of scenes that consist of up to 100 ordered instruction-following Room-to-Room (R2R) episodes, each defined by an individual language instruction and a target path. We present discrete and continuous Iterative Room-to-Room (IR2R) benchmarks comprising about 400 tours each in 80 indoor scenes. We find that extending the implicit memory of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Speech and dialogue systems