Sign Language: Towards Sign Understanding for Robot Autonomy

Ayush Agrawal; Joel Loo; Nicky Zimmerman; David Hsu

arXiv:2506.02556·cs.RO·September 17, 2025

Sign Language: Towards Sign Understanding for Robot Autonomy

Ayush Agrawal, Joel Loo, Nicky Zimmerman, David Hsu

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the task of navigational sign understanding for robots, proposing a benchmark and baseline using vision-language models to improve robot scene understanding and navigation in complex environments.

Contribution

It presents the first benchmark for navigational sign understanding, including evaluation metrics, a curated dataset, and a baseline approach leveraging vision-language models.

Findings

01

Vision-language models show promise in interpreting navigational signs.

02

The benchmark captures signs with varying complexity across diverse environments.

03

Baseline results demonstrate the feasibility of sign understanding for robot navigation.

Abstract

Navigational signs are common aids for human wayfinding and scene understanding, but are underutilized by robots. We argue that they benefit robot navigation and scene understanding, by directly encoding privileged information on actions, spatial regions, and relations. Interpreting signs in open-world settings remains a challenge owing to the complexity of scenes and signs, but recent advances in vision-language models (VLMs) make this feasible. To advance progress in this area, we introduce the task of navigational sign understanding which parses locations and associated directions from signs. We offer a benchmark for this task, proposing appropriate evaluation metrics and curating a test set capturing signs with varying complexity and design across diverse public spaces, from hospitals to shopping malls to transport hubs. We also provide a baseline approach using VLMs, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

NickyZimmerman/SiGNgapore2D
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems