UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation
Guangzhao Dai, Jian Zhao, Yuantao Chen, Yusen Qin, Hao Zhao, Guosen, Xie, Yazhou Yao, Xiangbo Shu, Xuelong Li

TL;DR
UnitedVLN introduces a novel pre-training paradigm that combines high-fidelity visual rendering and semantic features to improve continuous vision-language navigation in complex environments.
Contribution
The paper proposes UnitedVLN, a generalizable 3DGS-based pre-training method that unites appearance and semantic information for better navigation in continuous environments.
Findings
Outperforms state-of-the-art on VLN-CE benchmarks
Effectively integrates visual and semantic features
Enhances exploration in complex navigation scenarios
Abstract
Vision-and-Language Navigation (VLN), where an agent follows instructions to reach a target destination, has recently seen significant advancements. In contrast to navigation in discrete environments with predefined trajectories, VLN in Continuous Environments (VLN-CE) presents greater challenges, as the agent is free to navigate any unobstructed location and is more vulnerable to visual occlusions or blind spots. Recent approaches have attempted to address this by imagining future environments, either through predicted future visual images or semantic features, rather than relying solely on current observations. However, these RGB-based and feature-based methods lack intuitive appearance-level information or high-level semantic complexity crucial for effective navigation. To overcome these limitations, we introduce a novel, generalizable 3DGS-based pre-training paradigm, called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Automated Systems · Speech and dialogue systems
