Parameterized Channel Normalization for Far-field Deep Speaker Verification
Xuechen Liu, Md Sahidullah, Tomi Kinnunen

TL;DR
This paper introduces parameterized normalization techniques, PCEN and PCMN, integrated into DNN-based speaker verification systems to reduce environmental mismatch effects, resulting in significant performance improvements on a large-scale far-field speech dataset.
Contribution
It proposes differentiable, trainable normalization methods, PCEN and PCMN, that can be jointly optimized with DNNs for robust far-field speaker verification.
Findings
Up to 39.5% relative EER reduction under mismatched conditions.
Outperforms conventional features on large-scale far-field corpus.
Demonstrates effectiveness of trainable normalization in real-world scenarios.
Abstract
We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e.g. room reverberation) and noise. To mitigate these effects, we focus on two parametric normalization methods: per-channel energy normalization (PCEN) and parameterized cepstral mean normalization (PCMN). Both methods contain differentiable parameters and thus can be conveniently integrated to, and jointly optimized with the DNN using automatic differentiation methods. We consider both fixed and trainable (data-driven) variants of each method. We evaluate the performance on Hi-MIA, a recent large-scale far-field speech corpus, with varied microphone and positional settings. Our methods outperform conventional mel filterbank features, with maximum of 33.5% and 39.5% relative improvement on equal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
MethodsTest
