UnitedVLN: Generalizable Gaussian Splatting for Continuous   Vision-Language Navigation

Guangzhao Dai; Jian Zhao; Yuantao Chen; Yusen Qin; Hao Zhao; Guosen; Xie; Yazhou Yao; Xiangbo Shu; Xuelong Li

arXiv:2411.16053·cs.CV·March 18, 2025

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

Guangzhao Dai, Jian Zhao, Yuantao Chen, Yusen Qin, Hao Zhao, Guosen, Xie, Yazhou Yao, Xiangbo Shu, Xuelong Li

PDF

Open Access

TL;DR

UnitedVLN introduces a novel pre-training paradigm that combines high-fidelity visual rendering and semantic features to improve continuous vision-language navigation in complex environments.

Contribution

The paper proposes UnitedVLN, a generalizable 3DGS-based pre-training method that unites appearance and semantic information for better navigation in continuous environments.

Findings

01

Outperforms state-of-the-art on VLN-CE benchmarks

02

Effectively integrates visual and semantic features

03

Enhances exploration in complex navigation scenarios

Abstract

Vision-and-Language Navigation (VLN), where an agent follows instructions to reach a target destination, has recently seen significant advancements. In contrast to navigation in discrete environments with predefined trajectories, VLN in Continuous Environments (VLN-CE) presents greater challenges, as the agent is free to navigate any unobstructed location and is more vulnerable to visual occlusions or blind spots. Recent approaches have attempted to address this by imagining future environments, either through predicted future visual images or semantic features, rather than relying solely on current observations. However, these RGB-based and feature-based methods lack intuitive appearance-level information or high-level semantic complexity crucial for effective navigation. To overcome these limitations, we introduce a novel, generalizable 3DGS-based pre-training paradigm, called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Automated Systems · Speech and dialogue systems