CMAB: A First National-Scale Multi-Attribute Building Dataset in China Derived from Open Source Data and GeoAI
Yecheng Zhang, Huimin Zhao, Ying Long

TL;DR
This paper presents the first comprehensive national-scale multi-attribute building dataset in China, created using GeoAI and multi-source data, enabling improved urban analysis and planning.
Contribution
It introduces a large-scale, multi-attribute building dataset covering all Chinese cities, developed with innovative GeoAI techniques and multi-source data integration.
Findings
Covering 29 million buildings and 21.3 billion m² of rooftops
Achieved an OCRNet extraction F1-Score of 89.93%
Most attribute predictions validated above 80% accuracy
Abstract
Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper introduces a geospatial artificial intelligence (GeoAI) framework for large-scale building modeling, presenting the first national-scale Multi-Attribute Building dataset (CMAB), covering 3,667 spatial cities, 29 million buildings, and 21.3 billion square meters of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 337.7 billion cubic meters of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating features such as morphology, location, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and Land Use
