PILOT: Introducing Transformers for Probabilistic Sound Event   Localization

Christopher Schymura; Benedikt B\"onninghoff; Tsubasa Ochiai; Marc; Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki; Dorothea Kolossa

arXiv:2106.03903·cs.SD·June 9, 2021

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

Christopher Schymura, Benedikt B\"onninghoff, Tsubasa Ochiai, Marc, Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

PDF

1 Repo

TL;DR

This paper introduces a transformer-based framework for sound event localization that captures temporal dependencies and models uncertainty in source positions, outperforming existing methods on multiple datasets.

Contribution

The paper presents a novel transformer architecture for sound localization that incorporates uncertainty modeling via Gaussian representations, surpassing prior recurrent neural network approaches.

Findings

01

Outperforms state-of-the-art methods on all tested datasets.

02

Effectively models uncertainty in source localization.

03

Achieves statistically significant improvements in accuracy.

Abstract

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural networks. Inspired by the success of transformer architectures as a suitable alternative to classical recurrent neural networks, this paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. Additionally, the estimated sound event positions are represented as multivariate Gaussian variables, yielding an additional notion of uncertainty, which many previously proposed deep learning-based systems designed for this application do not provide. The framework is evaluated on three publicly available multi-source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrschy/pilot
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.