Speaker Verification using Convolutional Neural Networks
Hossein Salehghaffari

TL;DR
This paper introduces a novel CNN-based approach for speaker verification that uses a Siamese framework to enhance discriminative power and robustness, outperforming traditional methods.
Contribution
The paper proposes a new CNN architecture combined with Siamese fine-tuning to improve speaker verification accuracy and robustness against within-speaker variations.
Findings
Outperforms traditional speaker verification methods
Creates more discriminative speaker models
Enhances robustness to speaker variation
Abstract
In this paper, a novel Convolutional Neural Network architecture has been developed for speaker verification in order to simultaneously capture and discard speaker and non-speaker information, respectively. In training phase, the network is trained to distinguish between different speaker identities for creating the background model. One of the crucial parts is to create the speaker models. Most of the previous approaches create speaker models based on averaging the speaker representations provided by the background model. We overturn this problem by further fine-tuning the trained model using the Siamese framework for generating a discriminative feature space to distinguish between same and different speakers regardless of their identity. This provides a mechanism which simultaneously captures the speaker-related information and create robustness to within-speaker variations. It is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
