Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke, Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

TL;DR
This paper introduces an attention-based sequence-to-sequence neural network for sound event localization, improving robustness and accuracy in noisy and reverberant environments by capturing temporal dependencies.
Contribution
It presents a novel application of attention-based sequence-to-sequence models to sound localization, outperforming existing methods in various acoustic conditions.
Findings
Superior localization accuracy in both anechoic and reverberant environments
Effective capture of temporal dependencies through attention mechanisms
Outperforms state-of-the-art methods on multiple datasets
Abstract
Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that incorporate temporal context into the estimation process seem to be well-suited for this task. This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model. These types of models have been successfully applied to problems in natural language processing and automatic speech recognition. In this work, a multi-channel audio signal is encoded to a latent representation, which is subsequently decoded to a sequence of estimated directions-of-arrival. Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Underwater Acoustics Research
