Run-Time Adaptation of Neural Beamforming for Robust Speech   Dereverberation and Denoising

Yoto Fujita; Aditya Arie Nugraha; Diego Di Carlo; Yoshiaki Bando,; Mathieu Fontaine; and Kazuyoshi Yoshii

arXiv:2410.22805·cs.SD·October 31, 2024

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising

Yoto Fujita, Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando,, Mathieu Fontaine, and Kazuyoshi Yoshii

PDF

Open Access

TL;DR

This paper introduces a run-time adaptive neural beamforming approach for robust speech enhancement in real environments, improving automatic speech recognition performance under mismatched conditions by integrating dereverberation and denoising.

Contribution

It proposes a unified WPD beamforming method with a DNN for joint dereverberation and denoising, enabling effective run-time adaptation in noisy, reverberant environments.

Findings

01

Run-time adaptation improves speech enhancement performance.

02

The proposed method outperforms previous cascaded approaches.

03

Effective across various speaker, reverberation, and noise conditions.

Abstract

This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions. This calls for run-time adaptation of the DNN. Although the ground-truth speech spectrogram required for adaptation is not available at run time, blind dereverberation and separation methods such as weighted prediction error (WPE) and fast multichannel nonnegative matrix factorization (FastMNMF) can be used for generating pseudo groundtruth data from a mixture. Based on this idea, a prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Speech Recognition and Synthesis