Multi-Channel Speech Denoising for Machine Ears

Cong Han; E. Merve Kaya; Kyle Hoefer; Malcolm Slaney; Simon Carlile

arXiv:2202.08793·eess.AS·February 18, 2022

Multi-Channel Speech Denoising for Machine Ears

Cong Han, E. Merve Kaya, Kyle Hoefer, Malcolm Slaney, Simon Carlile

PDF

Open Access

TL;DR

This paper presents a multi-channel speech denoising system for machine ears that combines neural networks and unsupervised clustering to enhance speech clarity in noisy, reverberant environments, improving intelligibility and user experience.

Contribution

It introduces a novel MCSDN-Beamforming-MCSDN framework and employs cACGMM for unsupervised training data enhancement, advancing noise reduction techniques for machine hearing.

Findings

01

cACGMM improves training data quality

02

System enhances speech intelligibility in noisy environments

03

Subjective evaluations favor the proposed approach

Abstract

This work describes a speech denoising system for machine ears that aims to improve speech intelligibility and the overall listening experience in noisy environments. We recorded approximately 100 hours of audio data with reverberation and moderate environmental noise using a pair of microphone arrays placed around each of the two ears and then mixed sound recordings to simulate adverse acoustic scenes. Then, we trained a multi-channel speech denoising network (MCSDN) on the mixture of recordings. To improve the training, we employ an unsupervised method, complex angular central Gaussian mixture model (cACGMM), to acquire cleaner speech from noisy recordings to serve as the learning target. We propose a MCSDN-Beamforming-MCSDN framework in the inference stage. The results of the subjective evaluation show that the cACGMM improves the training data, resulting in better noise reduction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing