Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

Yiwei Ou; Chung Ching Cheung; Jun Yang Ang; Xiaobin Ren; Ronggui Sun; Guansong Gao; Kaiqi Zhao; Manfredo Manfredini

arXiv:2605.09936·cs.CV·May 12, 2026

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

Yiwei Ou, Chung Ching Cheung, Jun Yang Ang, Xiaobin Ren, Ronggui Sun, Guansong Gao, Kaiqi Zhao, Manfredo Manfredini

PDF

1 Repo 1 Datasets

TL;DR

Urban-ImageNet is a comprehensive multi-modal dataset and benchmark for urban space perception, enabling evaluation of AI models across urban scene understanding, image-text retrieval, and object segmentation.

Contribution

It introduces a large-scale, theory-grounded urban dataset with a hierarchical taxonomy and multiple tasks, advancing urban perception evaluation in AI.

Findings

01

High performance in supervised scene classification

02

Challenges in cross-modal retrieval and object segmentation

03

Model performance improves with increased training data

Abstract

We present Urban-ImageNet, a large-scale multi-modal dataset and evaluation benchmark for urban space perception from user-generated social media imagery. The corpus contains over 2 Million public social media images and paired textual posts collected from Weibo across 61 urban sites in 24 Chinese cities across 2019-2025, with controlled benchmark subsets at 1K, 10K, and 100K scale and a full 2M corpus for large-scale training and evaluation. Urban-ImageNet is organized by HUSIC, a Hierarchical Urban Space Image Classification framework that defines a 10-class taxonomy grounded in urban theory. The taxonomy is designed to distinguish activated and non-activated public spaces, exterior and interior urban environments, accommodation spaces, consumption content, portraits, and non-spatial social-media content. Rather than treating urban imagery as generic scene data, Urban-ImageNet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiasun/dataset-2
github

Datasets

Yiwei-Ou/Urban-ImageNet
dataset· 488 dl
488 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.