UrbanCLIP: Learning Text-enhanced Urban Region Profiling with   Contrastive Language-Image Pretraining from the Web

Yibo Yan; Haomin Wen; Siru Zhong; Wei Chen; Haodong Chen; Qingsong; Wen; Roger Zimmermann; Yuxuan Liang

arXiv:2310.18340·cs.CL·March 26, 2024·1 cites

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong, Wen, Roger Zimmermann, Yuxuan Liang

PDF

Open Access 1 Repo

TL;DR

UrbanCLIP introduces a novel framework that integrates textual descriptions generated by LLMs with satellite imagery to enhance urban region profiling, demonstrating significant improvements over existing methods in predicting urban indicators.

Contribution

This paper pioneers the integration of textual modality via LLMs into urban imagery profiling, creating the first LLM-enhanced contrastive learning framework for this task.

Findings

01

Achieved an average 6.1% improvement in R^2 for urban indicator prediction

02

Demonstrated the effectiveness of text-image joint learning in urban profiling

03

Provided a new dataset and code for future research

Abstract

Urban region profiling from web-sourced data is of utmost importance for urban planning and sustainable development. We are witnessing a rising trend of LLMs for various fields, especially dealing with multi-modal data research such as vision-language learning, where the text modality serves as a supplement information for the image. Since textual modality has never been introduced into modality combinations in urban region profiling, we aim to answer two fundamental questions in this paper: i) Can textual modality enhance urban region profiling? ii) and if so, in what ways and with regard to which aspects? To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stupidbuluchacha/urbanclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Human Mobility and Location-Based Analysis · Text and Document Classification Technologies