TL;DR
GeoFormer is a lightweight Swin Transformer model that accurately estimates building height and footprint from Sentinel data, outperforming CNN baselines and demonstrating strong transferability across diverse cities.
Contribution
The paper introduces GeoFormer, a novel multi-task transformer framework that jointly estimates building parameters using open-access satellite data, with comprehensive evaluation and public release.
Findings
GeoFormer achieves a 3.19 m RMSE in building height estimation.
It outperforms CNN baselines like UNet by 7.5%.
A 5x5 receptive field and DEM data are optimal for height estimation.
Abstract
Building height (BH) and footprint (BF) are fundamental urban morphological parameters required by climate modelling, disaster-risk assessment, and population mapping, yet globally consistent data remain scarce. In this work, we develop GeoFormer, a lightweight Swin Transformer-based multi-task learning framework that jointly estimates BH and BF on a 100 m grid using only open-access Sentinel-1 SAR, Sentinel-2 multispectral, and DEM data. A geo-blocked data-splitting strategy enforces strict spatial independence between training and evaluation regions across 54 morphologically diverse cities. We set representative CNN baselines (ResNet, UNet, SENet) as benchmarks and thoroughly evaluate GeoFormer's prediction accuracy, computational efficiency, and spatial transferability. Results show that GeoFormer achieves a BH RMSE of 3.19 m with only 0.32 M parameters -- outperforming the best CNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
