Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification
Xuechen Liu, Md Sahidullah, Tomi Kinnunen

TL;DR
This paper optimizes power normalized cepstral coefficients (PNCCs) for robust deep speaker verification by removing redundant processing steps and introducing channel energy normalization, leading to significant performance improvements across datasets.
Contribution
The study revisits and refines PNCC features by ablating medium-time processing and adding channel energy normalization, enhancing speaker verification accuracy.
Findings
Achieved 5.8% lower EER on VoxCeleb1
Achieved 61.2% lower EER on VoxMovies
Significant robustness improvement in cross-domain scenarios
Abstract
After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations on the power spectrogram, its temporal processing and amplitude scaling steps dedicated on environmental compensation may be redundant. Further, they might suppress intrinsic speaker variations that are useful for speaker verification based on deep neural networks (DNN). Therefore, in this study, we revisit and optimize PNCCs by ablating its medium-time processor and by introducing channel energy normalization. Experimental results with a DNN-based speaker verification system indicate substantial improvement over baseline PNCCs on both in-domain and cross-domain scenarios, reflected by relatively 5.8% and 61.2% maximum lower equal error rate on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
