Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing

Junyi Peng; Lin Zhang; Jiangyu Han; Old\v{r}ich Plchot; Johan Rohdin; Themos Stafylakis; Shuai Wang; Jan \v{C}ernock\'y

arXiv:2508.16232·eess.AS·November 11, 2025

Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing

Junyi Peng, Lin Zhang, Jiangyu Han, Old\v{r}ich Plchot, Johan Rohdin, Themos Stafylakis, Shuai Wang, Jan \v{C}ernock\'y

PDF

TL;DR

This paper presents a unified framework for in-situ structured pruning of self-supervised speech models, enabling significant compression with minimal performance loss for speaker verification and anti-spoofing tasks.

Contribution

It introduces a joint optimization approach that integrates pruning into fine-tuning, simplifying model compression for downstream speech tasks.

Findings

01

Achieves up to 70% parameter reduction with negligible performance loss.

02

Maintains low EERs of 0.7%, 0.8%, and 1.6% on Vox1 datasets.

03

Improves generalization in low-resource scenarios, reaching 3.7% EER on ASVspoof5.

Abstract

Although large-scale self-supervised learning (SSL) models like WavLM have achieved state-of-the-art performance in speech processing, their significant size impedes deployment on resource-constrained devices. While structured pruning is a key technique for model compression, existing methods typically separate it from task-specific fine-tuning. This multi-stage approach struggles to create optimal architectures tailored for diverse downstream tasks. In this work, we introduce a unified framework that integrates structured pruning into the downstream fine-tuning process. Our framework unifies these steps, jointly optimizing for task performance and model sparsity in a single stage. This allows the model to learn a compressed architecture specifically for the end task, eliminating the need for complex multi-stage pipelines and knowledge distillation. Our pruned models achieve up to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.