Transferring Knowledge from a RNN to a DNN

William Chan; Nan Rosemary Ke; Ian Lane

arXiv:1504.01483·cs.LG·April 8, 2015·55 cites

Transferring Knowledge from a RNN to a DNN

William Chan, Nan Rosemary Ke, Ian Lane

PDF

Open Access

TL;DR

This paper presents a method to transfer knowledge from a high-performing RNN to a smaller DNN for speech recognition, significantly improving the small DNN's accuracy on embedded systems.

Contribution

It introduces a knowledge transfer technique using soft alignments from RNNs to enhance small DNN performance in ASR tasks.

Findings

01

Small DNN achieved 3.93 WER on WSJ eval92.

02

Compared to baseline 4.54 WER, over 13% relative improvement.

03

Method enables efficient deployment on embedded systems.

Abstract

Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for embedded platforms is to either train a small DNN directly, or to train a small DNN that learns the output distribution of a large DNN. In this paper, we utilize a state-of-the-art RNN to transfer knowledge to small DNN. We use the RNN model to generate soft alignments and minimize the Kullback-Leibler divergence against the small DNN. The small DNN trained on the soft RNN alignments achieved a 3.93 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4.54 WER or more than 13%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing