Towards Lightweight Speaker Verification via Adaptive Neural Network   Quantization

Bei Liu; Haoyu Wang; Yanmin Qian

arXiv:2406.05359·eess.AS·December 3, 2024·2 cites

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

Bei Liu, Haoyu Wang, Yanmin Qian

PDF

Open Access

TL;DR

This paper introduces adaptive neural network quantization techniques, including uniform and mixed precision methods, to create lightweight speaker verification models that outperform existing systems in size and accuracy.

Contribution

The paper proposes novel adaptive quantization methods and a multi-stage fine-tuning strategy for efficient, high-performance lightweight speaker verification models.

Findings

01

Achieved lossless 4-bit uniform quantization with 8x compression.

02

Mixed precision quantization improves performance with similar model size.

03

Binarized models with new schemes outperform previous lightweight SV systems.

Abstract

Modern speaker verification (SV) systems typically demand expensive storage and computing resources, thereby hindering their deployment on mobile devices. In this paper, we explore adaptive neural network quantization for lightweight speaker verification. Firstly, we propose a novel adaptive uniform precision quantization method which enables the dynamic generation of quantization centroids customized for each network layer based on k-means clustering. By applying it to the pre-trained SV systems, we obtain a series of quantized variants with different bit widths. To enhance the performance of low-bit quantized models, a mixed precision quantization algorithm along with a multi-stage fine-tuning (MSFT) strategy is further introduced. Unlike uniform precision quantization, mixed precision approach allows for the assignment of varying bit widths to different network layers. When bit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing