SignReasoner: Compositional Reasoning for Complex Traffic Sign Understanding via Functional Structure Units
Ruibin Wang, Zhenyu Lin, Xinhai Zhao

TL;DR
SignReasoner introduces a compositional reasoning framework using Functional Structure Units to improve complex traffic sign understanding in vision-language models, enabling better generalization and data efficiency.
Contribution
It proposes a novel FSU-based decomposition method and a two-stage post-training pipeline to enhance VLMs' traffic sign reasoning without changing their architecture.
Findings
SignReasoner achieves state-of-the-art results on TrafficSignEval.
The method improves generalization to unseen sign configurations.
It enhances reasoning accuracy with high data efficiency.
Abstract
Accurate semantic understanding of complex traffic signs-including those with intricate layouts, multi-lingual text, and composite symbols-is critical for autonomous driving safety. Current models, both specialized small ones and large Vision Language Models (VLMs), suffer from a significant bottleneck: a lack of compositional generalization, leading to failure when encountering novel sign configurations. To overcome this, we propose SignReasoner, a novel paradigm that transforms general VLMs into expert traffic sign reasoners. Our core innovation is Functional Structure Unit (FSU), which shifts from common instance-based modeling to flexible function-based decomposition. By breaking down complex signs into minimal, core functional blocks (e.g., Direction, Notice, Lane), our model learns the underlying structural grammar, enabling robust generalization to unseen compositions. We define…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
