BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural   Networks and its Application to Distant Speech Recognition

Jaeyoung Kim; Mostafa El-Khamy; Jungwon Lee

arXiv:1710.10224·cs.CL·February 23, 2018

BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition

Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

PDF

Open Access

TL;DR

BridgeNet introduces a recursive student-teacher transfer learning framework utilizing multiple hints and intermediate features to significantly improve distant speech recognition accuracy in noisy environments.

Contribution

The paper presents a novel recursive architecture for student-teacher transfer learning that leverages multiple hints and intermediate features for enhanced speech denoising and recognition.

Findings

01

Achieved up to 13.24% relative WER reduction on AMI corpus.

02

Demonstrated the effectiveness of multiple hints and recursive structure.

03

Improved distant speech recognition in noisy conditions.

Abstract

Despite the remarkable progress achieved on automatic speech recognition, recognizing far-field speeches mixed with various noise sources is still a challenging task. In this paper, we introduce novel student-teacher transfer learning, BridgeNet which can provide a solution to improve distant speech recognition. There are two key features in BridgeNet. First, BridgeNet extends traditional student-teacher frameworks by providing multiple hints from a teacher network. Hints are not limited to the soft labels from a teacher network. Teacher's intermediate feature representations can better guide a student network to learn how to denoise or dereverberate noisy input. Second, the proposed recursive architecture in the BridgeNet can iteratively improve denoising and recognition performance. The experimental results of BridgeNet showed significant improvements in tackling the distant speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing