Incorporation of Speech Duration Information in Score Fusion of Speaker   Recognition Systems

Ali Khodabakhsh; Seyyed Saeed Sarfjoo; Umut Uludag; Osman Soyyigit,; Cenk Demiroglu

arXiv:1608.02272·cs.SD·August 9, 2016

Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems

Ali Khodabakhsh, Seyyed Saeed Sarfjoo, Umut Uludag, Osman Soyyigit,, Cenk Demiroglu

PDF

Open Access

TL;DR

This paper examines how speech duration affects speaker verification performance and proposes a score fusion method that improves accuracy by leveraging duration-specific information.

Contribution

It introduces a novel score fusion approach that incorporates speech duration information to enhance speaker recognition accuracy across varying durations.

Findings

01

Score fusion with duration info outperforms baseline methods

02

Performance degradation due to short speech durations is mitigated

03

The proposed method improves robustness in real-life scenarios

Abstract

In recent years identity-vector (i-vector) based speaker verification (SV) systems have become very successful. Nevertheless, environmental noise and speech duration variability still have a significant effect on degrading the performance of these systems. In many real-life applications, duration of recordings are very short; as a result, extracted i-vectors cannot reliably represent the attributes of the speaker. Here, we investigate the effect of speech duration on the performance of three state-of-the-art speaker recognition systems. In addition, using a variety of available score fusion methods, we investigate the effect of score fusion for those speaker verification techniques to benefit from the performance difference of different methods under different enrollment and test speech duration conditions. This technique performed significantly better than the baseline score fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing