Speaker Verification using Convolutional Neural Networks

Hossein Salehghaffari

arXiv:1803.05427·eess.AS·August 13, 2018·20 cites

Speaker Verification using Convolutional Neural Networks

Hossein Salehghaffari

PDF

Open Access

TL;DR

This paper introduces a novel CNN-based approach for speaker verification that uses a Siamese framework to enhance discriminative power and robustness, outperforming traditional methods.

Contribution

The paper proposes a new CNN architecture combined with Siamese fine-tuning to improve speaker verification accuracy and robustness against within-speaker variations.

Findings

01

Outperforms traditional speaker verification methods

02

Creates more discriminative speaker models

03

Enhances robustness to speaker variation

Abstract

In this paper, a novel Convolutional Neural Network architecture has been developed for speaker verification in order to simultaneously capture and discard speaker and non-speaker information, respectively. In training phase, the network is trained to distinguish between different speaker identities for creating the background model. One of the crucial parts is to create the speaker models. Most of the previous approaches create speaker models based on averaging the speaker representations provided by the background model. We overturn this problem by further fine-tuning the trained model using the Siamese framework for generating a discriminative feature space to distinguish between same and different speakers regardless of their identity. This provides a mechanism which simultaneously captures the speaker-related information and create robustness to within-speaker variations. It is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing