In Search of Autocorrelation Based Vocal Cord Cues for Speaker Identification
Md. Sahidullah, Goutam Saha

TL;DR
This paper explores autocorrelation-based features from the LP residual of speech signals to improve speaker identification accuracy by capturing vocal cord cues, complementing traditional vocal tract features.
Contribution
It introduces a novel autocorrelation-based feature extraction method from LP residuals for speaker identification, demonstrating improved accuracy when fused with traditional features.
Findings
Autocorrelation features provide complementary vocal cord information.
Fusion of these features with traditional ones enhances speaker identification accuracy.
Validated results on two public databases show improved performance.
Abstract
In this paper we investigate a technique to find out vocal source based features from the LP residual of speech signal for automatic speaker identification. Autocorrelation with some specific lag is computed for the residual signal to derive these features. Compared to traditional features like MFCC, PLPCC which represent vocal tract information, these features represent complementary vocal cord information. Our experiment in fusing these two sources of information in representing speaker characteristics yield better speaker identification accuracy. We have used Gaussian mixture model (GMM) based speaker modeling and results are shown on two public databases to validate our proposition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
