# SpaceNet: A Multimodal Fusion Architecture for Sound Source Localization in Disaster Response

**Authors:** Long Nguyen-Vu, Jonghoon Lee

PMC · DOI: 10.3390/s26010168 · Sensors (Basel, Switzerland) · 2025-12-26

## TL;DR

SpaceNet is a deep-learning model that improves sound source localization in disaster scenarios by combining audio data with sensor geometry.

## Contribution

SpaceNet introduces a dual-branch architecture and feature normalization for robust SSL in adverse environments.

## Key findings

- SpaceNet outperforms baseline models in accuracy when trained on ILD-only mel-spectra.
- Using ILD cues reduces computational overhead by 24 times compared to full mel-spectrograms.
- Time-invariant ILD features are more effective than complex temporal features in adverse conditions.

## Abstract

Sound source localization (SSL) has evolved from traditional signal-processing methods to sophisticated deep-learning architectures. However, applying these to distributed microphone arrays in adverse environments is complicated by high reverberation and potential sensor asynchrony, which can corrupt crucial Time-Difference-of-Arrival (TDoA) information. We introduce SpaceNet, a multimodal deep-learning architecture designed to address such issues by explicitly fusing audio features with sensor geometry. SpaceNet features: (1) a dual-branch architecture with specialized spatial processing that decomposes microphone geometry into distances, azimuths, and elevations; and (2) a feature-normalization technique to ensure stable multimodal training. Evaluation on real-world datasets from disaster sites demonstrates that SpaceNet, when trained on ILD-only mel-spectra, achieves better accuracy compared to our baseline model (CHAWA) and identical models trained on full mel-spectrograms. This approach also reduces computational overhead by a factor of 24. Our findings suggest that for distributed arrays in adverse environments, time-invariant ILD cues are a more effective and efficient feature for localization than complex temporal features corrupted by reverberation and synchronization errors.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), SSL (MESH:D012135), SoS (MESH:C565984)
- **Chemicals:** Water (MESH:D014867), SSL (-)
- **Species:** Felis catus (cat, species) [taxon 9685], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12788293/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12788293/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12788293/full.md

---
Source: https://tomesphere.com/paper/PMC12788293