Jointly optimal denoising, dereverberation, and source separation
Tomohiro Nakatani, Christoph Boeddeker, Keisuke Kinoshita, Rintaro, Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach

TL;DR
This paper introduces a new, computationally efficient method for jointly optimizing denoising, dereverberation, and source separation in speech processing, outperforming traditional cascade approaches.
Contribution
It develops a novel objective function and algorithms for jointly optimizing a convolutional beamformer for DN+DR+SS, improving performance and reducing computational cost.
Findings
Significant improvement in speech enhancement metrics.
Enhanced automatic speech recognition accuracy.
Reduced computational complexity compared to previous methods.
Abstract
This paper proposes methods that can optimize a Convolutional BeamFormer (CBF) for jointly performing denoising, dereverberation, and source separation (DN+DR+SS) in a computationally efficient way. Conventionally, cascade configuration composed of a Weighted Prediction Error minimization (WPE) dereverberation filter followed by a Minimum Variance Distortionless Response beamformer has been usedas the state-of-the-art frontend of far-field speech recognition, however, overall optimality of this approach is not guaranteed. In the blind signal processing area, an approach for jointly optimizing dereverberation and source separation (DR+SS) has been proposed, however, this approach requires huge computing cost, and has not been extended for application to DN+DR+SS. To overcome the above limitations, this paper develops new approaches for jointly optimizing DN+DR+SS in a computationally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
