Recurrent Neural Network Training with Dark Knowledge Transfer

Zhiyuan Tang; Dong Wang; Zhiyong Zhang

arXiv:1505.04630·stat.ML·September 21, 2016

Recurrent Neural Network Training with Dark Knowledge Transfer

Zhiyuan Tang, Dong Wang, Zhiyong Zhang

PDF

TL;DR

This paper demonstrates that using a deep neural network as a teacher can effectively transfer knowledge to train recurrent neural networks, specifically LSTMs, in speech recognition tasks with limited data.

Contribution

It introduces a novel approach where a weaker DNN model is used as a teacher to train RNNs, showing success without special tricks in limited data scenarios.

Findings

01

RNNs trained with DNN teachers perform well in ASR tasks.

02

Knowledge transfer from weaker to stronger models is effective.

03

Training succeeds without additional tricks even with limited data.

Abstract

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have gained much attention in automatic speech recognition (ASR). Although some successful stories have been reported, training RNNs remains highly challenging, especially with limited training data. Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision. This knowledge transfer learning has been employed to train simple neural nets with a complex one, so that the final performance can reach a level that is infeasible to obtain by regular training. In this paper, we employ the knowledge transfer learning approach to train RNNs (precisely LSTM) using a deep neural network (DNN) model as the teacher. This is different from most of the existing research on knowledge transfer learning, since the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.