Identifying Speakers Using Their Emotion Cues
Ismail Shahin

TL;DR
This paper introduces a two-stage speaker identification system that leverages emotional cues, significantly improving accuracy by integrating emotion and speaker recognition using HMMs and SPHMMs.
Contribution
It presents a novel two-stage recognizer combining emotion and speaker recognition, enhancing identification accuracy over traditional methods.
Findings
Achieved 79.92% speaker identification accuracy with the proposed method.
Significant improvement over one-stage recognizer (71.58%).
Performance comparable to human listener evaluations.
Abstract
This paper addresses the formulation of a new speaker identification approach which employs knowledge of emotional content of speaker information. Our proposed approach in this work is based on a two-stage recognizer that combines and integrates both emotion recognizer and speaker recognizer into one recognizer. The proposed approach employs both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. In the experiments, six emotions are considered including neutral, angry, sad, happy, disgust and fear. Our results show that average speaker identification performance based on the proposed two-stage recognizer is 79.92% with a significant improvement over a one-stage recognizer with an identification performance of 71.58%. The results obtained based on the proposed approach are close to those achieved in subjective evaluation by human listeners.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
