LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation

Jiankun Peng; Jianyuan Guo; Yiguang Yang; Yue Liu; Jiashuang Yan; Ying Xu

arXiv:2605.09053·cs.CV·May 12, 2026

LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation

Jiankun Peng, Jianyuan Guo, Yiguang Yang, Yue Liu, Jiashuang Yan, Ying Xu

PDF

1 Repo

TL;DR

LCGNav introduces a local geometric enhancement framework for topological vision-language navigation, improving performance by focusing on relevant candidate views and integrating seamlessly with existing models.

Contribution

It proposes a novel local geometric modeling method that enhances topological navigation by explicitly converting candidate views into 3D point clouds and applying targeted fusion strategies.

Findings

01

Improves key metrics on R2R-CE and RxR-CE benchmarks.

02

Achieves best performance among online topological methods on val-unseen splits.

03

Enhances multiple baseline architectures with low additional training cost.

Abstract

Online topological planning has become an effective paradigm for Vision-Language Navigation in Continuous Environments (VLN-CE), but existing methods still suffer from two limitations: redundant local depth information and weakened focus on current frontier candidates as the topological graph grows. To address this, we propose LCGNav, a modular local geometric enhancement framework for topological VLN. LCGNav explicitly converts candidate depth views into 3D point clouds and applies physical truncation based on the agent's reachable range, enabling more compact local geometric modeling. It further introduces a dimension-preserving local fusion strategy with transient state degradation, so that geometric enhancement is applied only to the currently relevant ghost nodes without changing the original planner interface. Experiments on R2R-CE and RxR-CE show that LCGNav serves as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shannanshouyin/LCGNav
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.