Multi-Task Learning with High-Order Statistics for X-vector based   Text-Independent Speaker Verification

Lanhua You; Wu Guo; Lirong Dai; Jun Du

arXiv:1903.12058·eess.AS·April 5, 2019·1 cites

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

Lanhua You, Wu Guo, Lirong Dai, Jun Du

PDF

Open Access

TL;DR

This paper introduces a multi-task learning approach for x-vector speaker verification that incorporates high-order statistical reconstruction to enhance embedding robustness and discriminability, showing improved results on standard datasets.

Contribution

The paper proposes a novel multi-task training framework combining classification and statistical reconstruction to improve x-vector embeddings for speaker verification.

Findings

01

Outperforms original x-vector approach on NIST SRE16 and VOiCES datasets.

02

Achieves higher discriminability and robustness with minimal additional complexity.

03

Demonstrates effectiveness of high-order statistics in speaker embedding training.

Abstract

The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the primary task of classifying the target speakers, and the auxiliary task of reconstructing the first- and higher-order statistics of the original input utterance. The proposed training strategy aggregates both the supervised and unsupervised learning into one framework to make the speaker embeddings more discriminative and robust. Experiments are carried out using the NIST SRE16 evaluation dataset and the VOiCES dataset. The results demonstrate that our proposed method outperforms the original x-vector approach with very low additional complexity added.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing