TL;DR
This paper introduces a structured pruning method guided by knowledge distillation to compress self-supervised speech models for speaker diarization, achieving significant efficiency gains while maintaining or improving accuracy across multiple datasets.
Contribution
The study systematically explores model compression for SSL-based diarization, demonstrating that simple structured pruning with distillation effectively reduces model size and inference time without performance loss.
Findings
Achieves up to 80% model size reduction
Enables 4x faster inference without accuracy loss
Maintains or improves performance across diverse datasets
Abstract
Self-supervised learning (SSL) models such as WavLM have substantially advanced speaker diarization by providing rich contextual speech representations. However, the high computational and memory costs of these models hinder deployment in real-time and resource-constrained scenarios. This work presents a systematic study on compressing SSL-based diarization models through structured pruning guided by knowledge distillation. We investigate pruning objectives that target both model parameters and computational complexity, and analyze alternative strategies, showing that a simple overall pruning approach provides the best balance between efficiency and accuracy. Our method achieves up to 80% model size reduction and 4x faster inference without performance degradation. Comprehensive experiments across eight public diarization datasets demonstrate that the pruned models consistently match or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
