Learnable Nonlinear Compression for Robust Speaker Verification

Xuechen Liu; Md Sahidullah; Tomi Kinnunen

arXiv:2202.05236·cs.SD·February 11, 2022

Learnable Nonlinear Compression for Robust Speaker Verification

Xuechen Liu, Md Sahidullah, Tomi Kinnunen

PDF

Open Access

TL;DR

This paper introduces data-driven nonlinear spectral feature compression techniques, including channel-dependent and multi-regime designs, to enhance robustness and accuracy in speaker verification systems, especially under challenging conditions.

Contribution

It proposes novel nonlinear compression methods based on power functions and dynamic range compression, with multi-regime design for improved robustness in speaker verification.

Findings

01

Significant EER reduction on VoxCeleb1 and VoxMovies datasets.

02

Power nonlinearities outperform traditional logarithmic compression.

03

Multi-regime design enhances robustness under diverse conditions.

Abstract

In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and VoxMovies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Flow Measurement and Analysis