DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF
Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki, Bando, Kazuyoshi Yoshii

TL;DR
This paper introduces a DNN-free, low-latency speech enhancement system that combines frame-online beamforming with block-online FastMNMF, enabling rapid adaptation to environment changes without neural network training.
Contribution
It proposes a novel DNN-free approach that uses FastMNMF posteriors for real-time covariance estimation, improving response speed and speech recognition accuracy.
Findings
Outperforms DNN-based beamforming by 5.0 WER points.
Enables quick adaptation to scene changes.
Operates efficiently in frame-online processing.
Abstract
This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distortionless response (MVDR) beamforming, one may train a deep neural network (DNN) that estimates time-frequency masks used for computing the covariance matrices of sources (speech and noise). Backpropagation-based run-time adaptation of the DNN was proposed for dealing with the mismatched training-test conditions. Instead, one may try to directly estimate the source covariance matrices with a state-of-the-art blind source separation method called fast multichannel non-negative matrix factorization (FastMNMF). In practice, however, neither the DNN nor the FastMNMF can be updated in a frame-online manner due to its computationally-expensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
