Speaker Verification in Emotional Talking Environments based on   Three-Stage Framework

Ismail Shahin

arXiv:1804.00155·cs.SD·April 3, 2018

Speaker Verification in Emotional Talking Environments based on Three-Stage Framework

Ismail Shahin

PDF

TL;DR

This paper proposes a three-stage framework combining gender, emotion, and speaker verification to improve performance in emotional talking environments, achieving results comparable to human judgment.

Contribution

The study introduces a novel three-stage framework that integrates gender and emotion cues to enhance speaker verification accuracy in emotional speech contexts.

Findings

01

Framework outperforms single-cue methods

02

Performance is comparable to human listeners

03

Validated on two independent emotional speech datasets

Abstract

This work is dedicated to introducing, executing, and assessing a three-stage speaker verification framework to enhance the degraded speaker verification performance in emotional talking environments. Our framework is comprised of three cascaded stages: gender identification stage followed by an emotion identification stage followed by a speaker verification stage. The proposed framework has been assessed on two distinct and independent emotional speech datasets: our collected dataset and Emotional Prosody Speech and Transcripts dataset. Our results demonstrate that speaker verification based on both gender cues and emotion cues is superior to each of speaker verification based on gender cues only, emotion cues only, and neither gender cues nor emotion cues. The achieved average speaker verification performance based on the suggested methodology is very similar to that attained in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.