UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal   Models in Multi-View Urban Scenarios

Baichuan Zhou; Haote Yang; Dairong Chen; Junyan Ye; Tianyi Bai; Jinhua; Yu; Songyang Zhang; Dahua Lin; Conghui He; Weijia Li

arXiv:2408.17267·cs.CV·March 11, 2025

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen, Junyan Ye, Tianyi Bai, Jinhua, Yu, Songyang Zhang, Dahua Lin, Conghui He, Weijia Li

PDF

Open Access 1 Repo 1 Video

TL;DR

UrBench is a new comprehensive benchmark for evaluating large multimodal models in complex multi-view urban scenarios, revealing current models' limitations in urban understanding tasks and cross-view relations.

Contribution

The paper introduces UrBench, a large-scale, multi-view urban benchmark with diverse tasks and data from 11 cities, enabling thorough evaluation of LMMs in urban environments.

Findings

01

Current LMMs perform significantly worse than humans in urban tasks.

02

Even the best models lag behind humans by 17.4% on average.

03

Models show inconsistent behavior across different urban views.

Abstract

Recent evaluations of Large Multimodal Models (LMMs) have explored their capabilities in various domains, with only few benchmarks specifically focusing on urban environments. Moreover, existing urban benchmarks have been limited to evaluating LMMs with basic region-level urban tasks under singular views, leading to incomplete evaluations of LMMs' abilities in urban environments. To address these issues, we present UrBench, a comprehensive benchmark designed for evaluating LMMs in complex multi-view urban scenarios. UrBench contains 11.6K meticulously curated questions at both region-level and role-level that cover 4 task dimensions: Geo-Localization, Scene Reasoning, Scene Understanding, and Object Understanding, totaling 14 task types. In constructing UrBench, we utilize data from existing datasets and additionally collect data from 11 cities, creating new annotations using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opendatalab/urbench
pytorchOfficial

Videos

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios· underline

Taxonomy

TopicsGeographic Information Systems Studies · Human Mobility and Location-Based Analysis