DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online   Beamforming Powered by Block-Online FastMNMF

Aditya Arie Nugraha; Kouhei Sekiguchi; Mathieu Fontaine; Yoshiaki; Bando; Kazuyoshi Yoshii

arXiv:2207.10934·eess.AS·July 25, 2022

DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki, Bando, Kazuyoshi Yoshii

PDF

TL;DR

This paper introduces a DNN-free, low-latency speech enhancement system that combines frame-online beamforming with block-online FastMNMF, enabling rapid adaptation to environment changes without neural network training.

Contribution

It proposes a novel DNN-free approach that uses FastMNMF posteriors for real-time covariance estimation, improving response speed and speech recognition accuracy.

Findings

01

Outperforms DNN-based beamforming by 5.0 WER points.

02

Enables quick adaptation to scene changes.

03

Operates efficiently in frame-online processing.

Abstract

This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distortionless response (MVDR) beamforming, one may train a deep neural network (DNN) that estimates time-frequency masks used for computing the covariance matrices of sources (speech and noise). Backpropagation-based run-time adaptation of the DNN was proposed for dealing with the mismatched training-test conditions. Instead, one may try to directly estimate the source covariance matrices with a state-of-the-art blind source separation method called fast multichannel non-negative matrix factorization (FastMNMF). In practice, however, neither the DNN nor the FastMNMF can be updated in a frame-online manner due to its computationally-expensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.