VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
Wensi Huang, Shaohao Zhu, Meng Wei, Jinming Xu, Xihui Liu, Hanqing Wang, Tai Wang, Feng Zhao, Jiangmiao Pang

TL;DR
This paper introduces VL-LN Bench, a new benchmark for long-horizon goal-oriented navigation that incorporates active dialog to resolve ambiguity, enabling more realistic and effective embodied navigation models.
Contribution
It proposes the VL-LN benchmark and a new task, IIGN, for training dialog-enabled navigation agents in realistic, ambiguous instruction scenarios.
Findings
Dialog-enabled navigation models outperform baselines.
VL-LN dataset contains over 41k trajectories for training.
Active dialog improves navigation success in ambiguous instructions.
Abstract
In most existing embodied navigation tasks, instructions are well-defined and unambiguous, such as instruction following and object searching. Under this idealized setting, agents are required solely to produce effective navigation outputs conditioned on vision and language inputs. However, real-world navigation instructions are often vague and ambiguous, requiring the agent to resolve uncertainty and infer user intent through active dialog. To address this gap, we propose Interactive Instance Goal Navigation (IIGN), a task that requires agents not only to generate navigation actions but also to produce language outputs via active dialog, thereby aligning more closely with practical settings. IIGN extends Instance Goal Navigation (IGN) by allowing agents to freely consult an oracle in natural language while navigating. Building on this task, we present the Vision Language-Language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
