DropClass and DropAdapt: Dropping classes for deep speaker representation learning
Chau Luu, Peter Bell, Steve Renals

TL;DR
This paper introduces DropClass and DropAdapt, two novel methods that improve deep speaker embedding training and adaptation by dropping classes during training, leading to significant performance gains in speaker verification tasks.
Contribution
It proposes two new class-dropping techniques, DropClass and DropAdapt, that enhance speaker embedding generalization and adaptation, outperforming existing methods.
Findings
DropClass achieves a 7.9% relative EER improvement on VoxCeleb.
DropAdapt yields a 13.2% relative EER improvement on VoxCeleb.
Both methods improve speaker verification performance.
Abstract
Many recent works on deep speaker embeddings train their feature extraction networks on large classification tasks, distinguishing between all speakers in a training set. Empirically, this has been shown to produce speaker-discriminative embeddings, even for unseen speakers. However, it is not clear that this is the optimal means of training embeddings that generalize well. This work proposes two approaches to learning embeddings, based on the notion of dropping classes during training. We demonstrate that both approaches can yield performance gains in speaker verification tasks. The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks. Combined with an additive angular margin loss, this method can yield…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
