TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant   Sound Source Separation

Ali Aroudi; Stefan Uhlich; Marc Ferras Font

arXiv:2110.04047·eess.AS·August 23, 2022

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Ali Aroudi, Stefan Uhlich, Marc Ferras Font

PDF

Open Access

TL;DR

TRUNet is a novel deep learning model combining transformer, recurrent, and U-Net architectures for multi-channel reverberant sound source separation, leveraging spatial, spectral, and temporal diversities to outperform existing methods.

Contribution

The paper introduces TRUNet, an end-to-end multi-channel source separation network that directly estimates filters from multi-channel spectra, integrating spatial attention and spectro-temporal processing.

Findings

01

TRUNet outperforms state-of-the-art methods on realistic reverberant datasets.

02

The model effectively captures spatial, spectral, and temporal diversities.

03

Training with a combined loss improves separation performance.

Abstract

In recent years, many deep learning techniques for single-channel sound source separation have been proposed using recurrent, convolutional and transformer networks. When multiple microphones are available, spatial diversity between speakers and background noise in addition to spectro-temporal diversity can be exploited by using multi-channel filters for sound source separation. Aiming at end-to-end multi-channel source separation, in this paper we propose a transformer-recurrent-U network (TRUNet), which directly estimates multi-channel filters from multi-channel input spectra. TRUNet consists of a spatial processing network with an attention mechanism across microphone channels aiming at capturing the spatial diversity, and a spectro-temporal processing network aiming at capturing spectral and temporal diversities. In addition to multi-channel filters, we also consider estimating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation