Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation
Kaijie Yun, Yue Chen

TL;DR
This paper introduces a dual-heatmap approach for semantic navigation that predicts continuous reachable regions and orientation constraints, improving robustness and transferability across different robot embodiments.
Contribution
It proposes a unified vision-language framework using dual heatmaps to better model spatial uncertainty and improve navigation success in human-robot interaction.
Findings
Achieves state-of-the-art performance among 8B parameter models.
Explicit heatmap prediction significantly improves affordance rate.
Demonstrates robustness across diverse robot embodiments.
Abstract
Grounding open-ended semantic instructions into physically executable local goals is a fundamental challenge in human-robot interaction. While existing navigation frameworks often regress deterministic waypoints, this rigid formulation collapses spatial uncertainty and frequently targets non-traversable object centers, leading to severe execution failures. In this work, we focus on the practical setting of in-FOV semantic navigation, where a robot receives concise, interleaved multimodal (text and image) prompts. To bridge the gap between abstract semantic intent and physical reachability, we propose a unified Vision-Language framework that abandons single-point regression in favor of a Dual-Heatmap representation. Our framework predicts a navigation affordance heatmap that captures continuous reachable regions, coupled with a facing heatmap for orientation constraints. These dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
