DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

Hyung Gun Chi; Zakaria Aldeneh; Tatiana Likhomanenko; Oggi Rudovic; Takuya Higuchi; Li-Wei Chen; Shinji Watanabe; Ahmed Hussen Abdelaziz

arXiv:2507.02911·cs.LG·July 8, 2025

DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

Hyung Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen, Shinji Watanabe, Ahmed Hussen Abdelaziz

PDF

TL;DR

DiceHuBERT is a novel knowledge distillation framework that compresses HuBERT models by directly replacing the original with a student trained using the same SSL objective, leading to significant performance improvements.

Contribution

It introduces a new distillation approach that leverages HuBERT's self-distillation mechanism, avoiding complex mappings and enhancing efficiency.

Findings

01

Over 21% improvement in phoneme recognition

02

More than 14% enhancement in ASR performance

03

Consistent outperformance over existing methods across tasks

Abstract

We introduce DiceHuBERT, a knowledge distillation framework for compressing HuBERT, a widely used self-supervised learning (SSL)-based speech foundation model. Unlike existing distillation methods that rely on layer-wise and feature-wise mapping between teacher and student models, DiceHuBERT leverages HuBERT's iterative self-distillation mechanism by directly replacing the original model with a student model. This replacement allows the student to be trained using the same SSL objective used when pre-training HuBERT, eliminating the need for additional modules or architectural constraints. Experimental results on SUPERB show that DiceHuBERT consistently outperforms existing distillation methods, improving phoneme recognition performance by over 21% and ASR performance by more than 14%. Furthermore, DiceHuBERT demonstrates competitive performance across multiple tasks, highlighting its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.