Optimized Power Normalized Cepstral Coefficients towards Robust Deep   Speaker Verification

Xuechen Liu; Md Sahidullah; Tomi Kinnunen

arXiv:2109.12058·cs.SD·September 27, 2021

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

Xuechen Liu, Md Sahidullah, Tomi Kinnunen

PDF

TL;DR

This paper optimizes power normalized cepstral coefficients (PNCCs) for robust deep speaker verification by removing redundant processing steps and introducing channel energy normalization, leading to significant performance improvements across datasets.

Contribution

The study revisits and refines PNCC features by ablating medium-time processing and adding channel energy normalization, enhancing speaker verification accuracy.

Findings

01

Achieved 5.8% lower EER on VoxCeleb1

02

Achieved 61.2% lower EER on VoxMovies

03

Significant robustness improvement in cross-domain scenarios

Abstract

After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations on the power spectrogram, its temporal processing and amplitude scaling steps dedicated on environmental compensation may be redundant. Further, they might suppress intrinsic speaker variations that are useful for speaker verification based on deep neural networks (DNN). Therefore, in this study, we revisit and optimize PNCCs by ablating its medium-time processor and by introducing channel energy normalization. Experimental results with a DNN-based speaker verification system indicate substantial improvement over baseline PNCCs on both in-domain and cross-domain scenarios, reflected by relatively 5.8% and 61.2% maximum lower equal error rate on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.