# OpenStreetMap-derived multimodal dataset across 23 cities: Paired urban morphology tiles with bioclimatic variables

**Authors:** Tao He, Wei Lu

PMC · DOI: 10.1016/j.dib.2026.112518 · 2026-01-27

## TL;DR

This paper introduces a dataset combining urban maps and climate data from 23 cities, supporting machine learning and urban design studies.

## Contribution

The novelty lies in providing a paired multimodal dataset with urban morphology and bioclimatic variables for machine learning applications.

## Key findings

- The dataset includes 11,711 tile-level samples with paired ecological and urban morphology maps.
- Bioclimatic variables are provided for each tile, enabling climate-informed urban analysis.
- The dataset supports generative models and comparative urban studies across different climates.

## Abstract

We present an OpenStreetMap-derived multimodal dataset spanning 23 cities and 11,711 tile-level samples. For each 768 × 768 m tile, we provide an aligned image pair: (i) a stylized ecological baseline that generalizes green and water features together with major roads and railways, and (ii) a target urban morphology map color-coded by functional building classes, transport infrastructure, green space, and water. Each sample includes latitude/longitude; the eight WorldClim v2.1 bioclimatic variables can be reconstructed locally with the provided script. The dataset is organized by city and indexed with JSONL records linking image paths and attributes, enabling direct integration into machine learning pipelines. Cross-city and cross-climate coverage supports training and evaluation of generative models for urban design, comparative analyses of morphology across climate regimes, and imputation of functional footprints in data-scarce regions. The ecological baseline represents a constructed pre-urban template rather than a historical map.

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12907003/full.md

---
Source: https://tomesphere.com/paper/PMC12907003