Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Aswin Shanmugam Subramanian; Szu-Jui Chen; Shinji Watanabe

arXiv:1803.10013·eess.AS·March 28, 2018·1 cites

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Aswin Shanmugam Subramanian, Szu-Jui Chen, Shinji Watanabe

PDF

Open Access

TL;DR

This paper introduces a student-teacher learning approach for single-channel speech enhancement, using multichannel beamformed signals as teacher targets to improve speech recognition accuracy.

Contribution

It proposes a novel student-teacher paradigm that leverages multichannel beamformed masks to enhance single-channel speech signals for better ASR performance.

Findings

01

Improved speech recognition accuracy on CHiME-4 data

02

Effective mimicry of multichannel masks by single-channel network

03

Enhanced speech quality for ASR tasks

Abstract

Spectral mask estimation using bidirectional long short-term memory (BLSTM) neural networks has been widely used in various speech enhancement applications, and it has achieved great success when it is applied to multichannel enhancement techniques with a mask-based beamformer. However, when these masks are used for single channel speech enhancement they severely distort the speech signal and make them unsuitable for speech recognition. This paper proposes a student-teacher learning paradigm for single channel speech enhancement. The beamformed signal from multichannel enhancement is given as input to the teacher network to obtain soft masks. An additional cross-entropy loss term with the soft mask target is combined with the original loss, so that the student network with single-channel input is trained to mimic the soft mask obtained with multichannel input through beamforming.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies