Latent space representation for multi-target speaker detection and   identification with a sparse dataset using Triplet neural networks

Kin Wai Cheuk; Balamurali B. T.; Gemma Roig; Dorien Herremans

arXiv:1910.01463·cs.SD·October 7, 2019

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans

PDF

1 Repo

TL;DR

This paper introduces a Triplet Neural Network approach to speaker recognition that outperforms traditional methods, especially on small datasets with limited samples per speaker, by effectively learning a discriminative latent space.

Contribution

The paper demonstrates that Triplet Neural Networks can significantly improve speaker recognition accuracy on small datasets compared to baseline models.

Findings

01

Outperforms baseline by 23% on the MCE 2018 dataset.

02

Reduces confusions by 46% in low-data scenarios.

03

Effective in small-sample speaker recognition tasks.

Abstract

We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the $i$ -vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains $i$ -vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the $i$ -vector representation), our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KinWaiCheuk/MCE2018
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.