TL;DR
This paper introduces Curricular SincNet, a deep speaker recognition model that emphasizes hard samples in latent space using a curriculum loss, leading to improved robustness and lower error rates across multiple datasets.
Contribution
The paper proposes a novel curriculum loss function for SincNet, enhancing speaker recognition by focusing on hard samples and imposing inter-class margins.
Findings
Achieves best overall results in inter-dataset testing with 4% lower error rate.
Performs competitively on multiple datasets in intra- and inter-dataset evaluations.
Outperforms previous SincNet models and published work in robustness and accuracy.
Abstract
Deep learning models have become an increasingly preferred option for biometric recognition systems, such as speaker recognition. SincNet, a deep neural network architecture, gained popularity in speaker recognition tasks due to its parameterized sinc functions that allow it to work directly on the speech signal. The original SincNet architecture uses the softmax loss, which may not be the most suitable choice for recognition-based tasks. Such loss functions do not impose inter-class margins nor differentiate between easy and hard training samples. Curriculum learning, particularly those leveraging angular margin-based losses, has proven very successful in other biometric applications such as face recognition. The advantage of such a curriculum learning-based techniques is that it will impose inter-class margins as well as taking to account easy and hard samples. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
