Direction-Aware Adaptive Online Neural Speech Enhancement with an   Augmented Reality Headset in Real Noisy Conversational Environments

Kouhei Sekiguchi; Aditya Arie Nugraha; Yicheng Du; Yoshiaki Bando,; Mathieu Fontaine; Kazuyoshi Yoshii

arXiv:2207.07296·eess.AS·July 18, 2022

Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando,, Mathieu Fontaine, Kazuyoshi Yoshii

PDF

1 Repo

TL;DR

This paper presents a dual-process online speech enhancement method for AR headsets that combines DNN-based beamforming with FastMNMF-guided adaptation, significantly improving speech recognition in noisy environments.

Contribution

It introduces a novel dual-process online speech enhancement approach that adaptively combines deep neural network beamforming with FastMNMF for real-time AR applications.

Findings

01

Word error rate improved by over 10 points with 12 minutes of adaptation.

02

Method effectively handles real noisy, reverberant environments.

03

AR transcription accuracy enhanced through spatial and temporal processing.

Abstract

This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well in various environments thanks to its unsupervised nature. Its heavy computational cost, however, prevents its application to real-time processing. In contrast, a supervised beamforming method that uses a deep neural network (DNN) for estimating spatial information of speech and noise readily fits real-time processing, but suffers from drastic performance degradation in mismatched conditions. Given such complementary characteristics, we propose a dual-process robust online speech enhancement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sekiguchi92/SpeechEnhancement
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.