# A spatially interpretable machine learning framework for urban waterlogging risk mapping in Beijing

**Authors:** Yi Tang

PMC · DOI: 10.7717/peerj.20977 · PeerJ · 2026-03-19

## TL;DR

This paper introduces a machine learning framework to map urban waterlogging risks in Beijing, combining remote sensing data and hybrid modeling techniques for better spatial accuracy.

## Contribution

A novel hybrid machine learning framework integrating XGBoost and MGWR for spatially interpretable urban waterlogging risk mapping.

## Key findings

- XGBoost achieved the best classification performance with an AUC of 0.913 ± 0.055.
- MGWR_XGBoost outperformed other models with a Brier score of 0.289 ± 0.039 and a PR-AUC of 0.576.
- The framework enables high-resolution risk mapping and supports urban flood governance.

## Abstract

Urban waterlogging is an escalating challenge under rapid urbanization and climate change, yet accurate spatial prediction remains hindered by nonlinear drivers and spatial heterogeneity. This study proposes a spatially interpretable machine learning framework by integrating remote sensing and geospatial data with hybrid modeling. Using recorded waterlogging locations in Beijing, we constructed a balanced dataset with topographic, hydrological, land cover, and proximity-based predictors. Four machine learning algorithms—Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and eXtreme Gradient Boosting (XGBoost)—were evaluated, with XGBoost achieving the best classification performance (area under the curve (AUC) = 0.913 ± 0.055). To enhance spatial interpretability, two hybrid strategies were further developed: (1) XGBoost_MGWR, in which XGBoost serves as the primary predictor and MGWR corrects its spatially structured residuals, thereby improving spatial explanatory power; and (2) MGWR_XGBoost, where MGWR first models spatially varying effects and XGBoost subsequently fits the residuals to refine predictive performance. Results from spatially blocked five-fold cross-validation show that MGWR_XGBoost provides the best probabilistic accuracy (Brier = 0.289 ± 0.039) and the highest area under the precision recall (PR-AUC) (0.576), with substantially higher specificity (0.734) and a spatially stable local R2 pattern; therefore, it was selected for final risk mapping. The proposed framework enables high-resolution, spatially explicit risk mapping and offers practical support for drainage planning, green infrastructure prioritization, and adaptive flood governance. Beyond Beijing, this approach shows strong potential for improving resilience in other data-scarce urban environments facing intensifying flood risks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13006009/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13006009/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC13006009/full.md

---
Source: https://tomesphere.com/paper/PMC13006009