Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing
Junyi Peng, Lin Zhang, Jiangyu Han, Old\v{r}ich Plchot, Johan Rohdin, Themos Stafylakis, Shuai Wang, Jan \v{C}ernock\'y

TL;DR
This paper presents a unified framework for in-situ structured pruning of self-supervised speech models, enabling significant compression with minimal performance loss for speaker verification and anti-spoofing tasks.
Contribution
It introduces a joint optimization approach that integrates pruning into fine-tuning, simplifying model compression for downstream speech tasks.
Findings
Achieves up to 70% parameter reduction with negligible performance loss.
Maintains low EERs of 0.7%, 0.8%, and 1.6% on Vox1 datasets.
Improves generalization in low-resource scenarios, reaching 3.7% EER on ASVspoof5.
Abstract
Although large-scale self-supervised learning (SSL) models like WavLM have achieved state-of-the-art performance in speech processing, their significant size impedes deployment on resource-constrained devices. While structured pruning is a key technique for model compression, existing methods typically separate it from task-specific fine-tuning. This multi-stage approach struggles to create optimal architectures tailored for diverse downstream tasks. In this work, we introduce a unified framework that integrates structured pruning into the downstream fine-tuning process. Our framework unifies these steps, jointly optimizing for task performance and model sparsity in a single stage. This allows the model to learn a compressed architecture specifically for the end task, eliminating the need for complex multi-stage pipelines and knowledge distillation. Our pruned models achieve up to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
