SignScene: Visual Sign Grounding for Mapless Navigation

Nicky Zimmerman; Joel Loo; Benjamin Koh; Zishuo Wang; David Hsu

arXiv:2602.12686·cs.RO·February 16, 2026

SignScene: Visual Sign Grounding for Mapless Navigation

Nicky Zimmerman, Joel Loo, Benjamin Koh, Zishuo Wang, David Hsu

PDF

Open Access

TL;DR

This paper introduces SignScene, a novel approach for robot mapless navigation using sign grounding with vision-language models, achieving high accuracy and enabling real-world navigation solely based on signs.

Contribution

The paper proposes SignScene, a new spatial-semantic representation that improves sign grounding for navigation, combining vision-language models with sign-centric scene understanding.

Findings

01

Achieved 88% sign grounding accuracy on diverse environments.

02

Enabled real-world mapless navigation using only signs.

03

SignScene outperforms baseline methods in sign grounding tasks.

Abstract

Navigational signs enable humans to navigate unfamiliar environments without maps. This work studies how robots can similarly exploit signs for mapless navigation in the open world. A central challenge lies in interpreting signs: real-world signs are diverse and complex, and their abstract semantic contents need to be grounded in the local 3D scene. We formalize this as sign grounding, the problem of mapping semantic instructions on signs to corresponding scene elements and navigational actions. Recent Vision-Language Models (VLMs) offer the semantic common-sense and reasoning capabilities required for this task, but are sensitive to how spatial information is represented. We propose SignScene, a sign-centric spatial-semantic representation that captures navigation-relevant scene elements and sign information, and presents them to VLMs in a form conducive to effective reasoning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Social Robot Interaction and HRI