TL;DR
This paper introduces a Triplet Neural Network approach to speaker recognition that outperforms traditional methods, especially on small datasets with limited samples per speaker, by effectively learning a discriminative latent space.
Contribution
The paper demonstrates that Triplet Neural Networks can significantly improve speaker recognition accuracy on small datasets compared to baseline models.
Findings
Outperforms baseline by 23% on the MCE 2018 dataset.
Reduces confusions by 46% in low-data scenarios.
Effective in small-sample speaker recognition tasks.
Abstract
We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the -vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains -vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the -vector representation), our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
