UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban   Socioeconomic Indicator Prediction

Xixuan Hao; Wei Chen; Yibo Yan; Siru Zhong; Kun Wang; Qingsong Wen,; Yuxuan Liang

arXiv:2403.16831·cs.CV·January 23, 2025·5 cites

UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction

Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen,, Yuxuan Liang

PDF

Open Access 2 Repos

TL;DR

UrbanVLP introduces a multi-granularity vision-language pretraining framework that combines macro satellite and micro street-view data, improving urban socioeconomic prediction accuracy and text quality over prior models.

Contribution

It is the first to integrate multi-granularity urban data with automatic text calibration, enhancing socioeconomic indicator prediction and addressing limitations of existing models.

Findings

01

Outperforms existing models on six socioeconomic prediction tasks.

02

Effectively combines macro and micro urban data for better insights.

03

Produces high-quality, reliable urban image descriptions.

Abstract

Urban socioeconomic indicator prediction aims to infer various metrics related to sustainable development in diverse urban landscapes using data-driven methods. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place. Secondly, the text generated by the precursor work UrbanCLIP, which fully utilizes the extensive knowledge of LLMs, frequently exhibits issues such as hallucination and homogenization, resulting in a lack of reliable quality. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLand Use and Ecosystem Services · Remote Sensing and Land Use · Human Mobility and Location-Based Analysis