Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers

Tzu-Quan Lin; Tsung-Huan Yang; Chun-Yao Chang; Kuang-Ming Chen; Tzu-hsun Feng; Hung-yi Lee; Hao Tang

arXiv:2211.09949·cs.CL·August 19, 2025·1 cites

Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers

Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-yi Lee, Hao Tang

PDF

Open Access 1 Repo

TL;DR

This paper systematically compares four compression techniques for self-supervised speech Transformers, evaluating their efficiency and practicality to guide deployment in real-world applications.

Contribution

It provides a comprehensive evaluation framework for multiple compression methods, highlighting their tradeoffs and practical benefits in speech Transformer models.

Findings

01

Each compression method offers unique advantages.

02

Evaluation metrics reveal different tradeoffs in size, speed, and accuracy.

03

Practical guidance for deploying compressed speech Transformers.

Abstract

Transformer-based self-supervised models have achieved remarkable success in speech processing, but their large size and high inference cost present significant challenges for real-world deployment. While numerous compression techniques have been proposed, inconsistent evaluation metrics make it difficult to compare their practical effectiveness. In this work, we conduct a comprehensive study of four common compression methods, including weight pruning, head pruning, low-rank approximation, and knowledge distillation on self-supervised speech Transformers. We evaluate each method under three key metrics: parameter count, multiply-accumulate operations, and real-time factor. Results show that each method offers distinct advantages. In addition, we contextualize recent compression techniques, comparing DistilHuBERT, FitHuBERT, LightHuBERT, ARMHuBERT, and STaRHuBERT under the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nervjack2/speech-ssl-compression
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques