NavRAG: Generating User Demand Instructions for Embodied Navigation   through Retrieval-Augmented LLM

Zihan Wang; Yaohui Zhu; Gim Hee Lee; Yachun Fan

arXiv:2502.11142·cs.AI·March 10, 2025

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

Zihan Wang, Yaohui Zhu, Gim Hee Lee, Yachun Fan

PDF

Open Access 1 Repo

TL;DR

NavRAG introduces a retrieval-augmented generation framework that creates diverse, user-demand-oriented instructions for embodied navigation, enhancing data quality and model performance in vision-and-language navigation tasks.

Contribution

The paper presents NavRAG, a novel RAG framework that generates high-quality, diverse navigation instructions by leveraging LLMs and scene retrieval, addressing limitations of previous data augmentation methods.

Findings

01

Annotated over 2 million navigation instructions across 861 scenes.

02

NavRAG improves navigation model performance with diverse instruction data.

03

Generated instructions better match user communication styles.

Abstract

Vision-and-Language Navigation (VLN) is an essential skill for embodied agents, allowing them to navigate in 3D environments following natural language instructions. High-performance navigation models require a large amount of training data, the high cost of manually annotating data has seriously hindered this field. Therefore, some previous methods translate trajectory videos into step-by-step instructions for expanding data, but such instructions do not match well with users' communication styles that briefly describe destinations or state specific needs. Moreover, local navigation trajectories overlook global context and high-level task planning. To address these issues, we propose NavRAG, a retrieval-augmented generation (RAG) framework that generates user demand instructions for VLN. NavRAG leverages LLM to build a hierarchical scene description tree for 3D scene understanding from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MrZihan/NavRAG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Speech and dialogue systems · Video Analysis and Summarization