Recycle-and-Distill: Universal Compression Strategy for   Transformer-based Speech SSL Models with Attention Map Reusing and Masking   Distillation

Kangwook Jang; Sungnyun Kim; Se-Young Yun; Hoirin Kim

arXiv:2305.11685·eess.AS·October 27, 2023·1 cites

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a universal compression method for transformer-based speech SSL models that reuses attention maps and employs a novel masking distillation technique, resulting in compact models with competitive speech recognition performance.

Contribution

It proposes attention map reuse and a masking distillation strategy to effectively compress speech SSL models without significant performance loss.

Findings

01

Achieves PER of 7.72% and WER of 9.96% on SUPERB benchmark.

02

Reduces model complexity while maintaining high speech recognition accuracy.

Abstract

Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks. However, huge number of parameters in speech SSL models necessitate the compression to a more compact model for wider usage in academia or small companies. In this study, we suggest to reuse attention maps across the Transformer layers, so as to remove key and query parameters while retaining the number of layers. Furthermore, we propose a novel masking distillation strategy to improve the student model's speech representation quality. We extend the distillation loss to utilize both masked and unmasked speech frames to fully leverage the teacher model's high-quality representation. Our universal compression strategy yields the student model that achieves phoneme error rate (PER) of 7.72% and word error rate (WER) of 9.96% on the SUPERB benchmark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sungnyun/armhubert
pytorchOfficial

Models

🤗
sungnyun/ARMHuBERT
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Absolute Position Encodings · Softmax · Layer Normalization