DAgger Diffusion Navigation: DAgger Boosted Diffusion Policy for Vision-Language Navigation
Haoxiang Shi, Xiang Deng, Zaijing Li, Gongwei Chen, Yaowei Wang, Liqiang Nie

TL;DR
This paper introduces DAgger Diffusion Navigation (DifNav), an end-to-end diffusion-based policy for vision-language navigation that outperforms traditional two-stage waypoint methods by modeling multi-modal actions directly in continuous space.
Contribution
The paper proposes a unified diffusion policy for VLN-CE that eliminates the need for waypoint prediction and incorporates DAgger training for robustness and error recovery.
Findings
Outperforms previous state-of-the-art models on benchmark datasets.
Eliminates reliance on waypoint predictors, simplifying the navigation pipeline.
Enhances robustness and long-horizon spatial reasoning in navigation tasks.
Abstract
Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural language instructions through free-form 3D spaces. Existing VLN-CE approaches typically use a two-stage waypoint planning framework, where a high-level waypoint predictor generates the navigable waypoints, and then a navigation planner suggests the intermediate goals in the high-level action space. However, this two-stage decomposition framework suffers from: (1) global sub-optimization due to the proxy objective in each stage, and (2) a performance bottleneck caused by the strong reliance on the quality of the first-stage predicted waypoints. To address these limitations, we propose DAgger Diffusion Navigation (DifNav), an end-to-end optimized VLN-CE policy that unifies the traditional two stages, i.e. waypoint generation and planning, into a single diffusion policy. Notably, DifNav employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
