Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation
Badi Li, Ren-jie Lu, Yu Zhou, Jingke Meng, Wei-shi Zheng

TL;DR
This paper introduces GOAL, a flow-based generative framework that leverages LLM-derived spatial priors to improve semantic map completion, enhancing generalization for object goal navigation in unseen environments.
Contribution
The work presents a novel flow-based model that incorporates LLM-inferred spatial priors for semantic map completion, advancing generalization in indoor environment navigation tasks.
Findings
Achieves state-of-the-art results on MP3D and Gibson datasets.
Demonstrates strong transferability to unseen environments like HM3D.
Effectively models uncertainty in indoor layouts using LLM-enriched priors.
Abstract
The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
