AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Dillon Loh; Tomasz Bednarz; Xinxing Xia; Frank Guan

arXiv:2411.18539·cs.CV·January 7, 2026

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

Dillon Loh, Tomasz Bednarz, Xinxing Xia, Frank Guan

PDF

Open Access 1 Repo

TL;DR

AdaVLN extends visual language navigation to dynamic indoor environments with moving humans, introducing a new simulator, datasets, and mechanisms to handle real-world complexities and improve reproducibility.

Contribution

We propose AdaVLN, a novel task and environment for navigation amidst moving humans, along with datasets and a freeze-time mechanism to facilitate research and reproducibility.

Findings

01

Baseline models face challenges with dynamic obstacles.

02

AdaVLN bridges the sim-to-real gap in VLN.

03

New datasets and simulator support dynamic environment research.

Abstract

Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, adding a layer of complexity to navigation tasks that mimic the real-world. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a "freeze-time" mechanism for both the navigation task and simulator,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dillonloh/adavln
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Hand Gesture Recognition Systems