DyGeoVLN: Infusing Dynamic Geometry Foundation Model into Vision-Language Navigation

Xiangchen Liu; Hanghan Zheng; Jeil Jeong; Minsung Yoon; Lin Zhao; Zhide Zhong; Haoang Li; Sung-Eui Yoon

arXiv:2603.21269·cs.RO·March 24, 2026

DyGeoVLN: Infusing Dynamic Geometry Foundation Model into Vision-Language Navigation

Xiangchen Liu, Hanghan Zheng, Jeil Jeong, Minsung Yoon, Lin Zhao, Zhide Zhong, Haoang Li, Sung-Eui Yoon

PDF

Open Access

TL;DR

DyGeoVLN introduces a dynamic geometry-aware framework for vision-language navigation, enabling better generalization and robustness in dynamic environments by integrating 3D spatial reasoning and efficient token pruning.

Contribution

This work presents a novel dynamic geometry foundation model integrated into VLN, with a pose-free token pruning strategy for efficient long-horizon navigation.

Findings

01

Achieves state-of-the-art results on multiple VLN benchmarks.

02

Demonstrates strong robustness in real-world dynamic environments.

03

Reduces inference cost through adaptive token pruning.

Abstract

Vision-language Navigation (VLN) requires an agent to understand visual observations and language instructions to navigate in unseen environments. Most existing approaches rely on static scene assumptions and struggle to generalize in dynamic, real-world scenarios. To address this challenge, we propose DyGeoVLN, a dynamic geometry-aware VLN framework. Our method infuses a dynamic geometry foundation model into the VLN framework through cross-branch feature fusion to enable explicit 3D spatial representation and visual-semantic reasoning. To efficiently compress historical token information in long-horizon, dynamic navigation, we further introduce a novel pose-free and adaptive-resolution token-pruning strategy. This strategy can remove spatio-temporal redundant tokens to reduce inference cost. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques