Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation

Kaijie Yun; Yue Chen

arXiv:2605.19420·cs.RO·May 20, 2026

Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation

Kaijie Yun, Yue Chen

PDF

TL;DR

This paper introduces a dual-heatmap approach for semantic navigation that predicts continuous reachable regions and orientation constraints, improving robustness and transferability across different robot embodiments.

Contribution

It proposes a unified vision-language framework using dual heatmaps to better model spatial uncertainty and improve navigation success in human-robot interaction.

Findings

01

Achieves state-of-the-art performance among 8B parameter models.

02

Explicit heatmap prediction significantly improves affordance rate.

03

Demonstrates robustness across diverse robot embodiments.

Abstract

Grounding open-ended semantic instructions into physically executable local goals is a fundamental challenge in human-robot interaction. While existing navigation frameworks often regress deterministic waypoints, this rigid formulation collapses spatial uncertainty and frequently targets non-traversable object centers, leading to severe execution failures. In this work, we focus on the practical setting of in-FOV semantic navigation, where a robot receives concise, interleaved multimodal (text and image) prompts. To bridge the gap between abstract semantic intent and physical reachability, we propose a unified Vision-Language framework that abandons single-point regression in favor of a Dual-Heatmap representation. Our framework predicts a navigation affordance heatmap that captures continuous reachable regions, coupled with a facing heatmap for orientation constraints. These dense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.