uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
Abdul Waheed, Karima Kadaoui, Bhiksha Raj, Muhammad Abdul-Mageed

TL;DR
This paper introduces uDistil-Whisper, a label-free data filtering method for knowledge distillation that enhances low-resource speech recognition models without requiring labeled data, outperforming supervised methods in efficiency and accuracy.
Contribution
The paper presents a novel label-free data filtering framework for distillation that eliminates the need for ground truth labels, enabling effective low-resource speech model training.
Findings
Distilled models outperform the teacher by 5-7 WER points.
Models are 25-50% more compute- and memory-efficient.
Models match or surpass supervised data filtering methods.
Abstract
Recent work on distilling Whisper's knowledge into small models using pseudo-labels shows promising performance while reducing the size by up to 50%. This results in small, efficient, and dedicated models. However, a critical step of distillation using pseudo-labels involves filtering high-quality predictions and using only those during training. This step requires ground truth labels to compare with and filter low-quality examples, making the process dependent on human labels. Additionally, the distillation process requires a large amount of data thereby limiting its applicability in low-resource settings. To address this, we propose a distillation framework that does not require any labeled data. Through experimentation, we show that our best-distilled models outperform the teacher model by 5-7 WER points and are on par with or outperform similar supervised data filtering setups. When…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Rough Sets and Fuzzy Logic
