DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Yifan Peng, Yui Sudo, Shakeel Muhammad, Shinji Watanabe

TL;DR
DPHuBERT is a novel speech SSL model compression technique that combines joint distillation and pruning, resulting in smaller, efficient models that outperform pure distillation on multiple tasks with limited training data.
Contribution
Introduces DPHuBERT, a task-agnostic compression method combining distillation and pruning for speech SSL models, improving efficiency and performance.
Findings
Outperforms pure distillation on SUPERB tasks
Requires less training time and data
Applicable to various speech SSL models
Abstract
Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model. However, the student architecture usually needs to be manually designed and will remain fixed during training, which requires prior knowledge and can lead to suboptimal performance. Inspired by recent success of task-specific structured pruning, we propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning. Experiments on SUPERB show that DPHuBERT outperforms pure distillation methods in almost all tasks. Moreover, DPHuBERT requires little training time and performs well with limited training data, making it suitable for resource-constrained applications. Our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
MethodsKnowledge Distillation
