Jointly optimal denoising, dereverberation, and source separation

Tomohiro Nakatani; Christoph Boeddeker; Keisuke Kinoshita; Rintaro; Ikeshita; Marc Delcroix; Reinhold Haeb-Umbach

arXiv:2005.09843·eess.AS·August 4, 2020

Jointly optimal denoising, dereverberation, and source separation

Tomohiro Nakatani, Christoph Boeddeker, Keisuke Kinoshita, Rintaro, Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach

PDF

TL;DR

This paper introduces a new, computationally efficient method for jointly optimizing denoising, dereverberation, and source separation in speech processing, outperforming traditional cascade approaches.

Contribution

It develops a novel objective function and algorithms for jointly optimizing a convolutional beamformer for DN+DR+SS, improving performance and reducing computational cost.

Findings

01

Significant improvement in speech enhancement metrics.

02

Enhanced automatic speech recognition accuracy.

03

Reduced computational complexity compared to previous methods.

Abstract

This paper proposes methods that can optimize a Convolutional BeamFormer (CBF) for jointly performing denoising, dereverberation, and source separation (DN+DR+SS) in a computationally efficient way. Conventionally, cascade configuration composed of a Weighted Prediction Error minimization (WPE) dereverberation filter followed by a Minimum Variance Distortionless Response beamformer has been usedas the state-of-the-art frontend of far-field speech recognition, however, overall optimality of this approach is not guaranteed. In the blind signal processing area, an approach for jointly optimizing dereverberation and source separation (DR+SS) has been proposed, however, this approach requires huge computing cost, and has not been extended for application to DN+DR+SS. To overcome the above limitations, this paper develops new approaches for jointly optimizing DN+DR+SS in a computationally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.