Full-info Training for Deep Speaker Feature Learning

Lantian Li; Zhiyuan Tang; Dong Wang; Thomas Fang Zheng

arXiv:1711.00366·cs.SD·February 28, 2018

Full-info Training for Deep Speaker Feature Learning

Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

PDF

Open Access

TL;DR

This paper introduces a full-info training method for deep speaker feature learning that removes the parametric classifier to enhance the discriminative power of features, resulting in improved speaker verification performance.

Contribution

It proposes a novel full-info training approach that discards the parametric classifier, enabling the feature network to learn more discriminative speaker representations.

Findings

01

Improved speaker verification accuracy on Fisher database.

02

More coherent and discriminative speaker features.

03

Performance gains over traditional training methods.

Abstract

In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By enforcing the model to discriminate the speakers in the training data, frame-level speaker features can be derived from the last hidden layer. In spite of its good performance, a potential problem of the present model is that it involves a parametric classifier, i.e., the last affine layer, which may consume some discriminative knowledge, thus leading to `information leak' for the feature learning. This paper presents a full-info training approach that discards the parametric classifier and enforces all the discriminative knowledge learned by the feature net. Our experiments on the Fisher database demonstrate that this new training scheme can produce more coherent features,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing