Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks
Zhong Qiu Lin, Alexander Wong

TL;DR
This paper introduces progressive label distillation, a method that iteratively distills training data to train input-efficient deep neural networks, demonstrated on speech recognition with substantial accuracy improvements.
Contribution
The study proposes a novel progressive label distillation technique that reduces input dimensions while maintaining high accuracy, advancing knowledge distillation beyond model compression.
Findings
78% test accuracy increase over direct learning
Effective data distillation for input-efficient networks
Applicable to speech recognition tasks
Abstract
Much of the focus in the area of knowledge distillation has been on distilling knowledge from a larger teacher network to a smaller student network. However, there has been little research on how the concept of distillation can be leveraged to distill the knowledge encapsulated in the training data itself into a reduced form. In this study, we explore the concept of progressive label distillation, where we leverage a series of teacher-student network pairs to progressively generate distilled training data for learning deep neural networks with greatly reduced input dimensions. To investigate the efficacy of the proposed progressive label distillation approach, we experimented with learning a deep limited vocabulary speech recognition network based on generated 500ms input utterances distilled progressively from 1000ms source training data, and demonstrated a significant increase in test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsKnowledge Distillation
