SALADnet: Self-Attentive multisource Localization in the Ambisonics   Domain

Pierre-Amaury Grumiaux; Srdan Kitic; Prerak Srivastava; Laurent Girin,; Alexandre Gu\'erin

arXiv:2107.11066·cs.SD·July 26, 2021

SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain

Pierre-Amaury Grumiaux, Srdan Kitic, Prerak Srivastava, Laurent Girin,, Alexandre Gu\'erin

PDF

Open Access

TL;DR

This paper introduces SALADnet, a self-attention neural network for multi-speaker localization in Ambisonics recordings, demonstrating improved performance and efficiency over traditional recurrent models.

Contribution

It replaces recurrent layers with self-attention encoders from Transformers, enhancing multi-source localization accuracy and computational efficiency.

Findings

01

Self-attention models outperform CRNN in multi-speaker scenarios.

02

Proposed models enable parallel processing, reducing execution time.

03

Models perform on par or better than state-of-the-art in synthetic and real data.

Abstract

In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data, with up to 3 simultaneous speakers. The obtained results indicate that the majority of the proposed architectures either perform on par, or outperform the CRNN baseline, especially in the multisource scenario. Moreover, by avoiding the recurrent layers, the proposed models lend themselves to parallel computing, which is shown to produce considerable savings in execution time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing