SSLIDE: Sound Source Localization for Indoors based on Deep Learning
Yifan Wu, Roshan Ayyalasomayajula, Michael J. Bianco, Dinesh Bharadia,, and Peter Gerstoft

TL;DR
SSLIDE employs deep neural networks with an encoder-decoder structure to accurately localize sound sources indoors, outperforming traditional methods in reverberant environments and demonstrating strong generalization capabilities.
Contribution
The paper introduces a novel deep learning approach with an encoder-decoder architecture for sound source localization in indoor spaces, handling multipath effects and improving accuracy.
Findings
Outperforms traditional localization methods like MUSIC and SRP-PHAT.
Effective in reverberant environments with strong generalization.
Uses likelihood surface representations for sound signals.
Abstract
This paper presents SSLIDE, Sound Source Localization for Indoors using DEep learning, which applies deep neural networks (DNNs) with encoder-decoder structure to localize sound sources with random positions in a continuous space. The spatial features of sound signals received by each microphone are extracted and represented as likelihood surfaces for the sound source locations in each point. Our DNN consists of an encoder network followed by two decoders. The encoder obtains a compressed representation of the input likelihoods. One decoder resolves the multipath caused by reverberation, and the other decoder estimates the source location. Experiments based on both the simulated and experimental data show that our method can not only outperform multiple signal classification (MUSIC), steered response power with phase transform (SRP-PHAT), sparse Bayesian learning (SBL), and a competing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Indoor and Outdoor Localization Technologies
