Sign Language: Towards Sign Understanding for Robot Autonomy
Ayush Agrawal, Joel Loo, Nicky Zimmerman, David Hsu

TL;DR
This paper introduces the task of navigational sign understanding for robots, proposing a benchmark and baseline using vision-language models to improve robot scene understanding and navigation in complex environments.
Contribution
It presents the first benchmark for navigational sign understanding, including evaluation metrics, a curated dataset, and a baseline approach leveraging vision-language models.
Findings
Vision-language models show promise in interpreting navigational signs.
The benchmark captures signs with varying complexity across diverse environments.
Baseline results demonstrate the feasibility of sign understanding for robot navigation.
Abstract
Navigational signs are common aids for human wayfinding and scene understanding, but are underutilized by robots. We argue that they benefit robot navigation and scene understanding, by directly encoding privileged information on actions, spatial regions, and relations. Interpreting signs in open-world settings remains a challenge owing to the complexity of scenes and signs, but recent advances in vision-language models (VLMs) make this feasible. To advance progress in this area, we introduce the task of navigational sign understanding which parses locations and associated directions from signs. We offer a benchmark for this task, proposing appropriate evaluation metrics and curating a test set capturing signs with varying complexity and design across diverse public spaces, from hospitals to shopping malls to transport hubs. We also provide a baseline approach using VLMs, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems
