UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction
Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen,, Yuxuan Liang

TL;DR
UrbanVLP introduces a multi-granularity vision-language pretraining framework that combines macro satellite and micro street-view data, improving urban socioeconomic prediction accuracy and text quality over prior models.
Contribution
It is the first to integrate multi-granularity urban data with automatic text calibration, enhancing socioeconomic indicator prediction and addressing limitations of existing models.
Findings
Outperforms existing models on six socioeconomic prediction tasks.
Effectively combines macro and micro urban data for better insights.
Produces high-quality, reliable urban image descriptions.
Abstract
Urban socioeconomic indicator prediction aims to infer various metrics related to sustainable development in diverse urban landscapes using data-driven methods. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place. Secondly, the text generated by the precursor work UrbanCLIP, which fully utilizes the extensive knowledge of LLMs, frequently exhibits issues such as hallucination and homogenization, resulting in a lack of reliable quality. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLand Use and Ecosystem Services · Remote Sensing and Land Use · Human Mobility and Location-Based Analysis
