The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic

Akash Sharma; Chinmay Mhatre; Sankalp Gawali; Ruthvik Bokkasam; Brij Kishore; Vishwajeet Pattanaik; Tarun Rambha; Abdul R. Pinjari; Vijay Kovvali; Anirban Chakraborty; Punit Rathore; Raghu Krishnapuram; Yogesh Simmhan

arXiv:2511.02563·cs.CV·November 5, 2025

The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic

Akash Sharma, Chinmay Mhatre, Sankalp Gawali, Ruthvik Bokkasam, Brij Kishore, Vishwajeet Pattanaik, Tarun Rambha, Abdul R. Pinjari, Vijay Kovvali, Anirban Chakraborty, Punit Rathore, Raghu Krishnapuram, Yogesh Simmhan

PDF

Open Access 2 Models 5 Datasets

TL;DR

This paper introduces UVH-26, a large annotated Indian traffic dataset, and trains models that significantly outperform baseline models on domain-specific traffic detection tasks.

Contribution

The paper presents the first large-scale, annotated Indian traffic dataset and demonstrates improved detection accuracy with domain-specific models trained on UVH-26.

Findings

01

Models trained on UVH-26 outperform baseline models by 8.4-31.5% in mAP50:95.

02

RT-DETR-X achieves 0.67 mAP50:95, surpassing COCO-trained models.

03

UVH-26 captures Indian urban traffic heterogeneity, enabling better detection in complex scenarios.

Abstract

This report describes the UVH-26 dataset, the first public release by AIM@IISc of a large-scale dataset of annotated traffic-camera images from India. The dataset comprises 26,646 high-resolution (1080p) images sampled from 2800 Bengaluru's Safe-City CCTV cameras over a 4-week period, and subsequently annotated through a crowdsourced hackathon involving 565 college students from across India. In total, 1.8 million bounding boxes were labeled across 14 vehicle classes specific to India: Cycle, 2-Wheeler (Motorcycle), 3-Wheeler (Auto-rickshaw), LCV (Light Commercial Vehicles), Van, Tempo-traveller, Hatchback, Sedan, SUV, MUV, Mini-bus, Bus, Truck and Other. Of these, 283k-316k consensus ground truth bounding boxes and labels were derived for distinct objects in the 26k images using Majority Voting and STAPLE algorithms. Further, we train multiple contemporary detectors, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis