LightHuBERT: Lightweight and Configurable Speech Representation Learning   with Once-for-All Hidden-Unit BERT

Rui Wang; Qibing Bai; Junyi Ao; Long Zhou; Zhixiang Xiong; Zhihua Wei,; Yu Zhang; Tom Ko; Haizhou Li

arXiv:2203.15610·eess.AS·June 22, 2022

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei,, Yu Zhang, Tom Ko, Haizhou Li

PDF

Open Access 1 Repo 1 Models

TL;DR

LightHuBERT introduces a flexible, highly compressed speech representation model that maintains high performance across tasks while significantly reducing model size through a novel architecture search and distillation strategy.

Contribution

It proposes a once-for-all Transformer compression framework for speech models, enabling automatic architecture search and substantial parameter reduction.

Findings

01

Outperforms HuBERT on ASR and SUPERB tasks with fewer parameters.

02

Achieves over 3.5x compression ratio in key speech tasks.

03

Maintains comparable performance to larger models with 29% fewer parameters.

Abstract

Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pruning structured parameters. More precisely, we create a Transformer-based supernet that is nested with thousands of weight-sharing subnets and design a two-stage distillation strategy to leverage the contextualized latent representations from HuBERT. Experiments on automatic speech recognition (ASR) and the SUPERB benchmark show the proposed LightHuBERT enables over $1 0^{9}$ architectures concerning the embedding dimension, attention dimension, head number, feed-forward network ratio, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mechanicalsea/lighthubert
pytorchOfficial

Models

🤗
mechanicalsea/lighthubert
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing

MethodsAttention Is All You Need · Pruning · Linear Layer · Residual Connection · Softmax · Dropout · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Label Smoothing