A Multi Level Data Fusion Approach for Speaker Identification on   Telephone Speech

Imen Trabelsi; Dorra Ben Ayed

arXiv:1407.0380·cs.SD·July 3, 2014·5 cites

A Multi Level Data Fusion Approach for Speaker Identification on Telephone Speech

Imen Trabelsi, Dorra Ben Ayed

PDF

Open Access

TL;DR

This paper proposes a multi-level data fusion approach using machine learning techniques and multiple feature sets to improve speaker identification accuracy in noisy telephone speech conditions.

Contribution

It introduces a novel combination of feature sets and machine learning models for robust speaker identification on degraded telephone speech data.

Findings

01

Significant improvement in speaker identification accuracy with data fusion.

02

Effective use of SVM and Naive Bayes with GMM for feature modeling.

03

Enhanced robustness against noisy audio conditions.

Abstract

Several speaker identification systems are giving good performance with clean speech but are affected by the degradations introduced by noisy audio conditions. To deal with this problem, we investigate the use of complementary information at different levels for computing a combined match score for the unknown speaker. In this work, we observe the effect of two supervised machine learning approaches including support vectors machines (SVM) and na\"ive bayes (NB). We define two feature vector sets based on mel frequency cepstral coefficients (MFCC) and relative spectral perceptual linear predictive coefficients (RASTA-PLP). Each feature is modeled using the Gaussian Mixture Model (GMM). Several ways of combining these information sources give significant improvements in a text-independent speaker identification task using a very large telephone degraded NTIMIT database.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing