FOM-Nav: Frontier-Object Maps for Object Goal Navigation
Thomas Chabal, Shizhe Chen, Jean Ponce, Cordelia Schmid

TL;DR
FOM-Nav introduces a novel frontier-object map framework combined with vision-language models to improve object goal navigation efficiency, achieving state-of-the-art results in benchmarks and demonstrating real-world applicability.
Contribution
The paper presents FOM-Nav, a modular system that integrates frontier-object maps and multimodal scene understanding for enhanced navigation in unknown environments.
Findings
Achieves state-of-the-art SPL on MP3D and HM3D benchmarks.
Constructs large-scale datasets from real-world environments.
Demonstrates effective real-robot navigation performance.
Abstract
This paper addresses the Object Goal Navigation problem, where a robot must efficiently find a target object in an unknown environment. Existing implicit memory-based methods struggle with long-term memory retention and planning, while explicit map-based approaches lack rich semantic information. To address these challenges, we propose FOM-Nav, a modular framework that enhances exploration efficiency through Frontier-Object Maps and vision-language models. Our Frontier-Object Maps are built online and jointly encode spatial frontiers and fine-grained object information. Using this representation, a vision-language model performs multimodal scene understanding and high-level goal prediction, which is executed by a low-level planner for efficient trajectory generation. To train FOM-Nav, we automatically construct large-scale navigation datasets from real-world scanned environments.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robot Manipulation and Learning
