The 9th AI City Challenge

Zheng Tang; Shuo Wang; David C. Anastasiu; Ming-Ching Chang; Anuj Sharma; Quan Kong; Norimasa Kobori; Munkhjargal Gochoo; Ganzorig Batnasan; Munkh-Erdene Otgonbold; Fady Alnajjar; Jun-Wei Hsieh; Tomasz Kornuta; Xiaolong Li; Yilin Zhao; Han Zhang; Subhashree Radhakrishnan; Arihant Jain; Ratnesh Kumar; Vidya N. Murali; Yuxing Wang; Sameer Satish Pusegaonkar; Yizhou Wang; Sujit Biswas; Xunlei Wu; Zhedong Zheng; Pranamesh Chakraborty; Rama Chellappa

arXiv:2508.13564·cs.CV·August 20, 2025

The 9th AI City Challenge

Zheng Tang, Shuo Wang, David C. Anastasiu, Ming-Ching Chang, Anuj Sharma, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Ganzorig Batnasan, Munkh-Erdene Otgonbold, Fady Alnajjar, Jun-Wei Hsieh, Tomasz Kornuta, Xiaolong Li, Yilin Zhao, Han Zhang, Subhashree Radhakrishnan

PDF

TL;DR

The 9th AI City Challenge showcased advancements in computer vision and AI across transportation, automation, and safety, with increased participation and new datasets fostering innovation in multi-camera tracking, incident understanding, spatial reasoning, and real-time detection.

Contribution

This edition introduced diverse tracks with novel datasets and benchmarks, promoting progress in multi-camera tracking, incident analysis, spatial reasoning, and edge-efficient detection in urban environments.

Findings

01

Achieved new state-of-the-art results in multiple tracks.

02

Generated extensive datasets with over 30,000 downloads.

03

Enhanced benchmarks for real-world AI applications in city environments.

Abstract

The ninth AI City Challenge continues to advance real-world applications of computer vision and AI in transportation, industrial automation, and public safety. The 2025 edition featured four tracks and saw a 17% increase in participation, with 245 teams from 15 countries registered on the evaluation server. Public release of challenge datasets led to over 30,000 downloads to date. Track 1 focused on multi-class 3D multi-camera tracking, involving people, humanoids, autonomous mobile robots, and forklifts, using detailed calibration and 3D bounding box annotations. Track 2 tackled video question answering in traffic safety, with multi-camera incident understanding enriched by 3D gaze labels. Track 3 addressed fine-grained spatial reasoning in dynamic warehouse environments, requiring AI systems to interpret RGB-D inputs and answer spatial questions that combine perception, geometry, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.