Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-yi Lee, Hao Tang

TL;DR
This paper systematically compares four compression techniques for self-supervised speech Transformers, evaluating their efficiency and practicality to guide deployment in real-world applications.
Contribution
It provides a comprehensive evaluation framework for multiple compression methods, highlighting their tradeoffs and practical benefits in speech Transformer models.
Findings
Each compression method offers unique advantages.
Evaluation metrics reveal different tradeoffs in size, speed, and accuracy.
Practical guidance for deploying compressed speech Transformers.
Abstract
Transformer-based self-supervised models have achieved remarkable success in speech processing, but their large size and high inference cost present significant challenges for real-world deployment. While numerous compression techniques have been proposed, inconsistent evaluation metrics make it difficult to compare their practical effectiveness. In this work, we conduct a comprehensive study of four common compression methods, including weight pruning, head pruning, low-rank approximation, and knowledge distillation on self-supervised speech Transformers. We evaluate each method under three key metrics: parameter count, multiply-accumulate operations, and real-time factor. Results show that each method offers distinct advantages. In addition, we contextualize recent compression techniques, comparing DistilHuBERT, FitHuBERT, LightHuBERT, ARMHuBERT, and STaRHuBERT under the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
