Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models

Jiangyu Han; Petr P\'alka; Marc Delcroix; Federico Landini; Johan Rohdin; Jan Cernock\'y; Luk\'a\v{s} Burget

arXiv:2506.18623·eess.AS·November 20, 2025

Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models

Jiangyu Han, Petr P\'alka, Marc Delcroix, Federico Landini, Johan Rohdin, Jan Cernock\'y, Luk\'a\v{s} Burget

PDF

1 Repo 5 Models

TL;DR

This paper introduces a structured pruning method guided by knowledge distillation to compress self-supervised speech models for speaker diarization, achieving significant efficiency gains while maintaining or improving accuracy across multiple datasets.

Contribution

The study systematically explores model compression for SSL-based diarization, demonstrating that simple structured pruning with distillation effectively reduces model size and inference time without performance loss.

Findings

01

Achieves up to 80% model size reduction

02

Enables 4x faster inference without accuracy loss

03

Maintains or improves performance across diverse datasets

Abstract

Self-supervised learning (SSL) models such as WavLM have substantially advanced speaker diarization by providing rich contextual speech representations. However, the high computational and memory costs of these models hinder deployment in real-time and resource-constrained scenarios. This work presents a systematic study on compressing SSL-based diarization models through structured pruning guided by knowledge distillation. We investigate pruning objectives that target both model parameters and computational complexity, and analyze alternative strategies, showing that a simple overall pruning approach provides the best balance between efficiency and accuracy. Our method achieves up to 80% model size reduction and 4x faster inference without performance degradation. Comprehensive experiments across eight public diarization datasets demonstrate that the pruned models consistently match or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

butspeechfit/diarizen
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning