Learnable Nonlinear Compression for Robust Speaker Verification
Xuechen Liu, Md Sahidullah, Tomi Kinnunen

TL;DR
This paper introduces data-driven nonlinear spectral feature compression techniques, including channel-dependent and multi-regime designs, to enhance robustness and accuracy in speaker verification systems, especially under challenging conditions.
Contribution
It proposes novel nonlinear compression methods based on power functions and dynamic range compression, with multi-regime design for improved robustness in speaker verification.
Findings
Significant EER reduction on VoxCeleb1 and VoxMovies datasets.
Power nonlinearities outperform traditional logarithmic compression.
Multi-regime design enhances robustness under diverse conditions.
Abstract
In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and VoxMovies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Flow Measurement and Analysis
