On the use of Stress information in Speech for Speaker Recognition
Laxmi Narayana M., Sunil Kumar Kopparapu

TL;DR
This paper investigates how inherent stress patterns in speech, characterized by pitch, amplitude, and duration, can serve as additional cues to improve speaker recognition, especially under stress or emotional conditions.
Contribution
It introduces a novel approach using stress-related speech features (PAD) as supplementary cues for enhancing speaker recognition accuracy.
Findings
PAD features are consistent for the same speaker across different words.
Stress patterns are unique to each speaker and can improve recognition.
Using stress information enhances speaker recognition robustness.
Abstract
The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, namely, pitch, amplitude and duration (together called PAD) We experimentally observe that the PAD vectors of similar phones in different words of a speaker are close to each other in the three dimensional (PAD) space confirming that the way a speaker stresses different syllables in their speech is unique to them, thus we propose the use of PAD based speaking style of a speaker as an additional feature for speaker recognition applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
