Urban In-Context Learning: Bridging Pretraining and Inference through Masked Diffusion for Urban Profiling
Ruixing Zhang, Bo Wang, Tongyu Zhu, Leilei Sun, Weifeng Lv

TL;DR
This paper introduces Urban In-Context Learning, a unified framework using masked autoencoding and diffusion models to predict urban profiles directly from data, outperforming traditional two-stage methods.
Contribution
It proposes a novel one-stage urban profiling model that unifies pretraining and inference through masked diffusion transformers and introduces a stabilization mechanism for training.
Findings
Outperforms state-of-the-art two-stage approaches on urban profiling tasks.
Demonstrates the effectiveness of diffusion modeling in urban data prediction.
Validates the proposed modules through extensive experiments and case studies.
Abstract
Urban profiling aims to predict urban profiles in unknown regions and plays a critical role in economic and social censuses. Existing approaches typically follow a two-stage paradigm: first, learning representations of urban areas; second, performing downstream prediction via linear probing, which originates from the BERT era. Inspired by the development of GPT style models, recent studies have shown that novel self-supervised pretraining schemes can endow models with direct applicability to downstream tasks, thereby eliminating the need for task-specific fine-tuning. This is largely because GPT unifies the form of pretraining and inference through next-token prediction. However, urban data exhibit structural characteristics that differ fundamentally from language, making it challenging to design a one-stage model that unifies both pretraining and inference. In this work, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
