Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings
Am\'elie Bosca, Alexandre Gu\'erin, Laur\'eline Perotin, Sr{\dj}an, Kiti\'c

TL;DR
This paper introduces a dilated U-net CNN architecture for multichannel speech enhancement from Ambisonics recordings, improving word error rates in challenging acoustic scenarios with fewer parameters.
Contribution
It replaces the LSTM with a dilated U-net, enhancing mask prediction accuracy and robustness in multi-speaker, reverberant environments, with reduced model complexity.
Findings
Dilated U-net improves speech recognition accuracy in complex conditions.
Use of dilated convolutions benefits scenarios with close-angle interfering speakers.
Model achieves similar or better performance with half the parameters.
Abstract
We present a CNN architecture for speech enhancement from multichannel first-order Ambisonics mixtures. The data-dependent spatial filters, deduced from a mask-based approach, are used to help an automatic speech recognition engine to face adverse conditions of reverberation and competitive speakers. The mask predictions are provided by a neural network, fed with rough estimations of speech and noise amplitude spectra, under the assumption of known directions of arrival. This study evaluates the replacing of the recurrent LSTM network previously investigated by a convolutive U-net under more stressing conditions with an additional second competitive speaker. We show that, due to more accurate short-term masks prediction, the U-net architecture brings some improvements in terms of word error rate. Moreover, results indicate that the use of dilated convolutive layers is beneficial in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMax Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · U-Net · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
