PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models

He Zhu; Junyou Su; Minxin Chen; Wen Wang; Yijie Deng; Guanhua Chen; Wenjia Zhang

arXiv:2505.14481·cs.CL·May 22, 2025

PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models

He Zhu, Junyou Su, Minxin Chen, Wen Wang, Yijie Deng, Guanhua Chen, Wenjia Zhang

PDF

Open Access

TL;DR

PlanGPT-VL is a specialized vision-language model designed for urban planning maps, improving analysis accuracy and reliability in planning tasks through innovative training and data synthesis methods.

Contribution

We introduce PlanGPT-VL, the first domain-specific VLM for urban planning maps, with novel data synthesis, verification, and training techniques tailored for spatial understanding.

Findings

01

Outperforms general VLMs on planning map tasks

02

Achieves high accuracy with only 7B parameters

03

Provides a reliable tool for urban planning analysis

Abstract

In the field of urban planning, existing Vision-Language Models (VLMs) frequently fail to effectively analyze and evaluate planning maps, despite the critical importance of these visual elements for urban planners and related educational contexts. Planning maps, which visualize land use, infrastructure layouts, and functional zoning, require specialized understanding of spatial configurations, regulatory requirements, and multi-scale analysis. To address this challenge, we introduce PlanGPT-VL, the first domain-specific Vision-Language Model tailored specifically for urban planning maps. PlanGPT-VL employs three innovative approaches: (1) PlanAnno-V framework for high-quality VQA data synthesis, (2) Critical Point Thinking to reduce hallucinations through structured verification, and (3) comprehensive training methodology combining Supervised Fine-Tuning with frozen vision encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Modeling in Geospatial Applications · Geographic Information Systems Studies