Exploiting Attention-based Sequence-to-Sequence Architectures for Sound   Event Localization

Christopher Schymura; Tsubasa Ochiai; Marc Delcroix; Keisuke; Kinoshita; Tomohiro Nakatani; Shoko Araki; Dorothea Kolossa

arXiv:2103.00417·cs.SD·March 2, 2021

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke, Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

PDF

Open Access 1 Repo

TL;DR

This paper introduces an attention-based sequence-to-sequence neural network for sound event localization, improving robustness and accuracy in noisy and reverberant environments by capturing temporal dependencies.

Contribution

It presents a novel application of attention-based sequence-to-sequence models to sound localization, outperforming existing methods in various acoustic conditions.

Findings

01

Superior localization accuracy in both anechoic and reverberant environments

02

Effective capture of temporal dependencies through attention mechanisms

03

Outperforms state-of-the-art methods on multiple datasets

Abstract

Sound event localization frameworks based on deep neural networks have shown increased robustness with respect to reverberation and noise in comparison to classical parametric approaches. In particular, recurrent architectures that incorporate temporal context into the estimation process seem to be well-suited for this task. This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model. These types of models have been successfully applied to problems in natural language processing and automatic speech recognition. In this work, a multi-channel audio signal is encoded to a latent representation, which is subsequently decoded to a sequence of estimated directions-of-arrival. Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rub-ksv/adrenaline
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Underwater Acoustics Research