Consistency-aware multi-channel speech enhancement using deep neural   networks

Yoshiki Masuyama; Masahito Togami; Tatsuya Komatsu

arXiv:2002.05831·eess.AS·February 17, 2020·1 cites

Consistency-aware multi-channel speech enhancement using deep neural networks

Yoshiki Masuyama, Masahito Togami, Tatsuya Komatsu

PDF

Open Access

TL;DR

This paper introduces a consistency-aware multi-channel speech enhancement system using deep neural networks that optimizes the quality of the reconstructed time-domain signal, addressing issues with spectrogram inconsistency.

Contribution

It proposes a novel objective function based on reconstructed time-domain signals for training DNN-based multi-channel speech enhancement systems.

Findings

01

Improved speech quality over traditional T-F masking methods

02

Effective enhancement with reconstructed time-domain objective functions

03

Demonstrated superiority in experimental comparisons

Abstract

This paper proposes a deep neural network (DNN)-based multi-channel speech enhancement system in which a DNN is trained to maximize the quality of the enhanced time-domain signal. DNN-based multi-channel speech enhancement is often conducted in the time-frequency (T-F) domain because spatial filtering can be efficiently implemented in the T-F domain. In such a case, ordinary objective functions are computed on the estimated T-F mask or spectrogram. However, the estimated spectrogram is often inconsistent, and its amplitude and phase may change when the spectrogram is converted back to the time-domain. That is, the objective function does not evaluate the enhanced time-domain signal properly. To address this problem, we propose to use an objective function defined on the reconstructed time-domain signal. Specifically, speech enhancement is conducted by multi-channel Wiener filtering in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation