Mapillary Vistas Validation for Fine-Grained Traffic Signs: A Benchmark Revealing Vision-Language Model Limitations
Sparsh Garg, Abhishek Aich

TL;DR
This paper introduces a new fine-grained traffic sign dataset derived from Mapillary, benchmarks vision-language models on it, and reveals current limitations in fine-grained visual understanding for autonomous driving.
Contribution
It creates a detailed, expert-annotated validation dataset for traffic signs and evaluates state-of-the-art models, highlighting their limitations in fine-grained recognition.
Findings
DINOv2 outperforms other vision-language models on traffic sign recognition.
Current models show significant limitations in fine-grained visual understanding.
The dataset enables more reliable and interpretable perception systems for autonomous driving.
Abstract
Obtaining high-quality fine-grained annotations for traffic signs is critical for accurate and safe decision-making in autonomous driving. Widely used datasets, such as Mapillary, often provide only coarse-grained labels - without distinguishing semantically important types such as stop signs or speed limit signs. To this end, we present a new validation set for traffic signs derived from the Mapillary dataset called Mapillary Vistas Validation for Traffic Signs (MVV), where we decompose composite traffic signs into granular, semantically meaningful categories. The dataset includes pixel-level instance masks and has been manually annotated by expert annotators to ensure label fidelity. Further, we benchmark several state-of-the-art VLMs against the self-supervised DINOv2 model on this dataset and show that DINOv2 consistently outperforms all VLM baselines-not only on traffic sign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
