Unified Hypersphere Embedding for Speaker Recognition

Mahdi Hajibabaei; Dengxin Dai

arXiv:1807.08312·eess.AS·July 24, 2018·51 cites

Unified Hypersphere Embedding for Speaker Recognition

Mahdi Hajibabaei, Dengxin Dai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified hypersphere embedding approach for speaker recognition that enhances accuracy without increasing model complexity by data augmentation, optimal embedding dimensionality, and a new loss function.

Contribution

It proposes a novel hypersphere embedding method with a logistic margin loss, improving speaker recognition accuracy without larger datasets or deeper models.

Findings

01

Repetition and time-reversion of utterances reduce errors by up to 18%.

02

Lower-dimensional embeddings are more effective for verification.

03

The proposed loss function achieves state-of-the-art identification accuracy.

Abstract

Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage costs and cannot be done indefinitely. In this work, we seek to improve the identification and verification accuracy of a text-independent speaker recognition system without use of extra data or deeper and more complex models by augmenting the training and testing data, finding the optimal dimensionality of embedding space and use of more discriminative loss functions. Results of experiments on VoxCeleb dataset suggest that: (i) Simple repetition and random time-reversion of utterances can reduce prediction errors by up to 18%. (ii) Lower dimensional embeddings are more suitable for verification. (iii) Use of proposed logistic margin loss function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MahdiHajibabaei/unified-embedding
caffe2Official

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing