Contrastive-mixup learning for improved speaker verification
Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, and Andreas Stolcke

TL;DR
This paper introduces contrastive-mixup, a novel data augmentation method for speaker verification that enhances model robustness and reduces error rates, especially with limited training data, by combining mixup with metric learning.
Contribution
It proposes a new contrastive-mixup strategy that integrates mixup with prototypical loss for improved speaker verification performance.
Findings
Contrastive-mixup outperforms baseline methods in speaker verification.
Error rate is reduced by 16% relatively with limited training data.
Method demonstrates better generalization with fewer training utterances.
Abstract
This paper proposes a novel formulation of prototypical loss with mixup for speaker verification. Mixup is a simple yet efficient data augmentation technique that fabricates a weighted combination of random data point and label pairs for deep neural network training. Mixup has attracted increasing attention due to its ability to improve robustness and generalization of deep neural networks. Although mixup has shown success in diverse domains, most applications have centered around closed-set classification tasks. In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric. During training, mixup operations generate convex interpolations of both inputs and virtual labels. Moreover, we have reformulated the prototypical loss function such that mixup is enabled on metric learning objectives. To demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixup
