Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

Haoning Xu; Zhaoqing Li; Youjun Chen; Huimeng Wang; Guinan Li; Mengzhe Geng; Chengxi Deng; Xunying Liu

arXiv:2505.22608·cs.SD·May 29, 2025

Effective and Efficient One-pass Compression of Speech Foundation Models Using Sparsity-aware Self-pinching Gates

Haoning Xu, Zhaoqing Li, Youjun Chen, Huimeng Wang, Guinan Li, Mengzhe Geng, Chengxi Deng, Xunying Liu

PDF

Open Access

TL;DR

This paper introduces a one-pass speech model compression method using sparsity-aware self-pinching gates, achieving significant parameter reduction with minimal impact on accuracy and faster operation.

Contribution

The paper proposes a novel integrated pruning and training approach with self-pinching gates for efficient speech model compression.

Findings

01

Reduces wav2vec2.0-base parameters by 65%

02

Achieves lowest WER of 7.05% on test-clean

03

Operates with 25% less compression time

Abstract

This paper presents a novel approach for speech foundation models compression that tightly integrates model pruning and parameter update into a single stage. Highly compact layer-level tied self-pinching gates each containing only a single learnable threshold are jointly trained with uncompressed models and used in fine-grained neuron level pruning. Experiments conducted on the LibriSpeech-100hr corpus suggest that our approach reduces the number of parameters of wav2vec2.0-base and HuBERT-large models by 65% and 60% respectively, while incurring no statistically significant word error rate (WER) increase on the test-clean dataset. Compared to previously published methods on the same task, our approach not only achieves the lowest WER of 7.05% on the test-clean dataset under a comparable model compression ratio of 4.26x, but also operates with at least 25% less model compression time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Neural Networks and Applications

MethodsPruning