NavComposer: Composing Language Instructions for Navigation Trajectories through Action-Scene-Object Modularization
Zongtao He, Liuyi Wang, Lu Chen, Chengju Liu, Qijun Chen

TL;DR
NavComposer is a modular framework that automatically generates high-quality, diverse navigation instructions by decomposing and recomposing semantic entities, supported by an annotation-free evaluation system, advancing scalable embodied AI research.
Contribution
It introduces NavComposer, a novel modular approach for automatic navigation instruction generation, and NavInstrCritic, an annotation-free evaluation system, enhancing scalability and diversity in embodied AI navigation tasks.
Findings
NavComposer produces rich, accurate instructions across diverse environments.
NavInstrCritic effectively evaluates instruction quality without expert annotations.
Experiments show improved instruction quality and evaluation robustness.
Abstract
Language-guided navigation is a cornerstone of embodied AI, enabling agents to interpret language instructions and navigate complex environments. However, expert-provided instructions are limited in quantity, while synthesized annotations often lack quality, making them insufficient for large-scale research. To address this, we propose NavComposer, a novel framework for automatically generating high-quality navigation instructions. NavComposer explicitly decomposes semantic entities such as actions, scenes, and objects, and recomposes them into natural language instructions. Its modular architecture allows flexible integration of state-of-the-art techniques, while the explicit use of semantic entities enhances both the richness and accuracy of instructions. Moreover, it operates in a data-agnostic manner, supporting adaptation to diverse navigation trajectories without domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
