DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech   Models

Yifan Peng; Yui Sudo; Shakeel Muhammad; Shinji Watanabe

arXiv:2305.17651·cs.CL·May 30, 2023·1 cites

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Yifan Peng, Yui Sudo, Shakeel Muhammad, Shinji Watanabe

PDF

Open Access 1 Repo

TL;DR

DPHuBERT is a novel speech SSL model compression technique that combines joint distillation and pruning, resulting in smaller, efficient models that outperform pure distillation on multiple tasks with limited training data.

Contribution

Introduces DPHuBERT, a task-agnostic compression method combining distillation and pruning for speech SSL models, improving efficiency and performance.

Findings

01

Outperforms pure distillation on SUPERB tasks

02

Requires less training time and data

03

Applicable to various speech SSL models

Abstract

Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment. Knowledge distillation trains a small student model to mimic the behavior of a large teacher model. However, the student architecture usually needs to be manually designed and will remain fixed during training, which requires prior knowledge and can lead to suboptimal performance. Inspired by recent success of task-specific structured pruning, we propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning. Experiments on SUPERB show that DPHuBERT outperforms pure distillation methods in almost all tasks. Moreover, DPHuBERT requires little training time and performs well with limited training data, making it suitable for resource-constrained applications. Our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pyf98/dphubert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques

MethodsKnowledge Distillation