Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
Julian Irigoyen, Arthur S\"ohler, Andreas S{\o}eborg Kirkedal

TL;DR
This paper demonstrates that one-shot magnitude pruning acts as an effective regularizer in ASR, improving generalization and enabling aggressive compression by identifying architecture-specific redundancies without fine-tuning.
Contribution
It introduces a sensitivity-aware pruning method that reveals architectural asymmetries and enhances one-shot pruning effectiveness in speech recognition models.
Findings
Pruning decoder self-attention reduces WER by 2.38% absolute without fine-tuning.
Pruning last encoder layers improves WER by 1.72% absolute.
Sensitivity-aware pruning enables 40% sparsity with minimal accuracy loss.
Abstract
We challenge the conventional view of neural network pruning as solely a compression technique, demonstrating that one-shot magnitude pruning serves as a powerful implicit regularizer for ASR. Using Whisper-small, we combine gradient- and Fisher-based sensitivity diagnostics with targeted, component-wise pruning. This reveals architectural asymmetries: decoder FFNs are pruning-fragile, whereas decoder self-attention and the last encoder layers contain redundancy that, when removed, improves generalization. Without fine-tuning, pruning 50% of decoder self-attention reduces WER by 2.38% absolute (20.44% relative) on LibriSpeech test-other; pruning the last four encoder layers at 50% instead yields a 1.72% absolute (14.8% relative) improvement. Gains persisted on Common Voice and TED-LIUM datasets. Beyond regularization benefits, our sensitivity-aware approach enables more aggressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
