Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language   Navigation

Zihan Wang; Xiangyang Li; Jiahao Yang; Yeqi Liu; Shuqiang Jiang

arXiv:2406.09798·cs.RO·October 15, 2024·1 cites

Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel sim-to-real transfer method for vision-and-language navigation, enabling monocular robots to achieve panoramic perception and navigation performance comparable to panoramic models, validated in both simulation and real-world environments.

Contribution

The paper proposes a new approach using 3D feature fields and semantic traversable maps to transfer panoramic VLN capabilities to monocular robots, enhancing real-world navigation.

Findings

01

Outperforms previous monocular VLN methods in benchmarks

02

Significantly improves real-world navigation performance

03

Validated in both simulation and real environments

Abstract

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VLN models trained with panoramic observation, perform better but are difficult to deploy on most monocular robots. For this case, we propose a sim-to-real transfer approach to endow the monocular robots with panoramic traversability perception and panoramic semantic understanding, thus smoothly transferring the high-performance panoramic VLN models to the common monocular robots. In this work, the semantic traversable map is proposed to predict agent-centric navigable waypoints, and the novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MrZihan/Sim2Real-VLN-3DFF
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization