Model Compression for DNN-based Speaker Verification Using Weight Quantization
Jingyu Li, Wei Liu, Zhaoyang Zhang, Jiong Wang, Tan Lee

TL;DR
This paper demonstrates that weight quantization effectively compresses DNN-based speaker verification models like ECAPA-TDNN and ResNet, significantly reducing size with minimal performance loss, especially in ResNet due to its smooth weight distribution.
Contribution
The study applies weight quantization to compress popular SV models and analyzes their robustness and knowledge retention, highlighting ResNet's superior robustness due to its weight distribution.
Findings
Model size reduced multiple times without performance degradation
ResNet shows more robust compression than ECAPA-TDNN
Quantized models retain most speaker-relevant knowledge
Abstract
DNN-based speaker verification (SV) models demonstrate significant performance at relatively high computation costs. Model compression can be applied to reduce the model size for lower resource consumption. The present study exploits weight quantization to compress two widely-used SV models, namely ECAPA-TDNN and ResNet. Experimental results on VoxCeleb show that weight quantization is effective for compressing SV models. The model size can be reduced multiple times without noticeable degradation in performance. Compression of ResNet shows more robust results than ECAPA-TDNN with lower-bitwidth quantization. Analysis of the layer weights suggests that the smooth weight distribution of ResNet may be related to its better robustness. The generalization ability of the quantized model is validated via a language-mismatched SV task. Furthermore, analysis by information probing reveals that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Residual Connection · Bottleneck Residual Block · Residual Block · Convolution · Global Average Pooling
