Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
Biswajit Dev Sarma, Rohan Kumar Das

TL;DR
This paper introduces emotion invariant speaker embeddings to improve speaker identification accuracy across emotional speech by mapping emotional variations to a neutral space, significantly reducing emotion-related mismatches.
Contribution
The work presents a novel extractor network that creates emotion invariant embeddings, enhancing speaker identification performance across different emotional states.
Findings
Achieved 2.6% absolute accuracy improvement in speaker identification.
Demonstrated effectiveness across four emotion classes from IEMOCAP.
Reduced emotion-induced variability in speaker models.
Abstract
Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We learn an extractor network that maps the test embeddings with different emotions obtained using i-vector based system to an emotion invariant space. The resultant test embeddings thus become emotion invariant and thereby compensate the mismatch between various emotional states. The studies are conducted using four different emotion classes from IEMOCAP database. We obtain an absolute improvement of 2.6% in accuracy for speaker identification studies using emotion invariant speaker embedding against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
