SignScene: Visual Sign Grounding for Mapless Navigation
Nicky Zimmerman, Joel Loo, Benjamin Koh, Zishuo Wang, David Hsu

TL;DR
This paper introduces SignScene, a novel approach for robot mapless navigation using sign grounding with vision-language models, achieving high accuracy and enabling real-world navigation solely based on signs.
Contribution
The paper proposes SignScene, a new spatial-semantic representation that improves sign grounding for navigation, combining vision-language models with sign-centric scene understanding.
Findings
Achieved 88% sign grounding accuracy on diverse environments.
Enabled real-world mapless navigation using only signs.
SignScene outperforms baseline methods in sign grounding tasks.
Abstract
Navigational signs enable humans to navigate unfamiliar environments without maps. This work studies how robots can similarly exploit signs for mapless navigation in the open world. A central challenge lies in interpreting signs: real-world signs are diverse and complex, and their abstract semantic contents need to be grounded in the local 3D scene. We formalize this as sign grounding, the problem of mapping semantic instructions on signs to corresponding scene elements and navigational actions. Recent Vision-Language Models (VLMs) offer the semantic common-sense and reasoning capabilities required for this task, but are sensitive to how spatial information is represented. We propose SignScene, a sign-centric spatial-semantic representation that captures navigation-relevant scene elements and sign information, and presents them to VLMs in a form conducive to effective reasoning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Social Robot Interaction and HRI
