# SDA-Net: A Spatially Optimized Dual-Stream Network with Adaptive Global Attention for Building Extraction in Multi-Modal Remote Sensing Images

**Authors:** Xuran Pan, Kexing Xu, Shuhao Yang, Yukun Liu, Rui Zhang, Ping He

PMC · DOI: 10.3390/s25072112 · Sensors (Basel, Switzerland) · 2025-03-27

## TL;DR

This paper introduces SDA-Net, a new network for extracting buildings from multi-modal remote sensing images, improving accuracy through adaptive global attention and multi-scale feature fusion.

## Contribution

The novel SDA-Net architecture with adaptive global attention for multi-scale cross-modal feature fusion in building extraction.

## Key findings

- SDA-Net achieved 97.66% F1 score and 95.42% IoU on the ISPRS Potsdam dataset.
- The method showed 96.56% F1 score and 93.35% IoU on the ISPRS Vaihingen dataset.
- SDA-Net obtained 91.35% F1 score and 84.08% IoU on the DFC23 Track2 dataset.

## Abstract

Building extraction plays a pivotal role in enabling rapid and accurate construction of urban maps, thereby supporting urban planning, smart city development, and urban management. Buildings in remote sensing imagery exhibit diverse morphological attributes and spectral signatures, yet their reliable interpretation through single-modal data remains constrained by heterogeneous terrain conditions, occlusions, and spatially variable illumination effects inherent to complex geographical landscapes. The integration of multi-modal data for building extraction offers significant advantages by leveraging complementary features from diverse data sources. However, the heterogeneity of multi-modal data complicates effective feature extraction, while the multi-scale cross-modal feature fusion encounters a semantic gap issue. To address these challenges, a novel building extraction network based on multi-modal remote sensing data called SDA-les (AGAFMs) was designed in the decoding stage to fuse multi-modal features at various scales, which dynamically adjust the importance of features from a global perspective to better balance the semantic information. The superior performance of the proposed method is demonstrated through comprehensive evaluations on the ISPRS Potsdam dataset with 97.66% F1 score and 95.42% IoU, the ISPRS Vaihingen dataset with 96.56% F1 score and 93.35% IoU, and the DFC23 Track2 dataset with 91.35% F1 score and 84.08% IoU.

## Full-text entities

- **Diseases:** ISPRS (MESH:C000719191), SIOM (MESH:D008569), injury to (MESH:D014947)
- **Chemicals:** CA (-), T (MESH:D014316)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11991180/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11991180/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/PMC11991180/full.md

---
Source: https://tomesphere.com/paper/PMC11991180