Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu,, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang

TL;DR
The paper introduces a self-refining data flywheel that iteratively improves training data quality for language-guided navigation, leading to significant performance gains surpassing human benchmarks.
Contribution
It presents a novel self-refining data generation process that enhances training data quality without human annotation, improving navigation performance and generalization.
Findings
Navigator performance improved from 70% to 78% SPL.
Generated data quality surpassed previous methods, with SPICE increasing from 23.5 to 26.2.
Achieved state-of-the-art results across multiple navigation tasks.
Abstract
Creating high-quality data for training robust language-instructed agents is a long-lasting challenge in embodied AI. In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation. Specifically, SRDF starts with using a base generator to create an initial data pool for training a base navigator, followed by applying the trained navigator to filter the data pool. This leads to higher-fidelity data to train a better generator, which can, in turn, produce higher-quality data for training the next-round navigator. Such a flywheel establishes a data self-refining process, yielding a continuously improved and highly effective dataset for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsBalanced Selection · Semi-Pseudo-Label
