TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation
Ali Aroudi, Stefan Uhlich, Marc Ferras Font

TL;DR
TRUNet is a novel deep learning model combining transformer, recurrent, and U-Net architectures for multi-channel reverberant sound source separation, leveraging spatial, spectral, and temporal diversities to outperform existing methods.
Contribution
The paper introduces TRUNet, an end-to-end multi-channel source separation network that directly estimates filters from multi-channel spectra, integrating spatial attention and spectro-temporal processing.
Findings
TRUNet outperforms state-of-the-art methods on realistic reverberant datasets.
The model effectively captures spatial, spectral, and temporal diversities.
Training with a combined loss improves separation performance.
Abstract
In recent years, many deep learning techniques for single-channel sound source separation have been proposed using recurrent, convolutional and transformer networks. When multiple microphones are available, spatial diversity between speakers and background noise in addition to spectro-temporal diversity can be exploited by using multi-channel filters for sound source separation. Aiming at end-to-end multi-channel source separation, in this paper we propose a transformer-recurrent-U network (TRUNet), which directly estimates multi-channel filters from multi-channel input spectra. TRUNet consists of a spatial processing network with an attention mechanism across microphone channels aiming at capturing the spatial diversity, and a spectro-temporal processing network aiming at capturing spectral and temporal diversities. In addition to multi-channel filters, we also consider estimating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation
