Blind and neural network-guided convolutional beamformer for joint   denoising, dereverberation, and source separation

Tomohiro Nakatani; Rintaro Ikeshita; Keisuke Kinoshita; Hiroshi; Sawada; Shoko Araki

arXiv:2108.01836·eess.AS·August 5, 2021

Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi, Sawada, Shoko Araki

PDF

TL;DR

This paper introduces a novel convolutional beamformer that jointly performs denoising, dereverberation, and source separation without prior information, enhanced by neural network guidance for improved speech quality and recognition.

Contribution

It develops a blind CBF optimization method extended from existing techniques and integrates neural network-based source spectra estimation, advancing joint speech enhancement capabilities.

Findings

01

Significant improvement in speech recognition accuracy.

02

Superior signal quality compared to state-of-the-art methods.

03

Effective joint denoising, dereverberation, and source separation.

Abstract

This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractable, we incorporate two techniques into the approach: the Source-Wise Factorization (SW-Fact) of a CBF and the Independent Vector Extraction (IVE). To further improve the performance, we develop a method that integrates a neural network(NN) based source power spectra estimation with CBF optimization by an inverse-Gamma prior. Experiments using noisy reverberant mixtures reveal that our proposed method with both blind and NN-guided scenarios greatly outperforms the conventional state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.