Attention Back-end for Automatic Speaker Verification with Multiple   Enrollment Utterances

Chang Zeng; Xin Wang; Erica Cooper; Xiaoxiao Miao; Junichi Yamagishi

arXiv:2104.01541·eess.AS·October 25, 2022

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel attention-based back-end model for speaker verification that effectively utilizes multiple enrollment utterances, improving accuracy over traditional methods like PLDA and cosine similarity across various datasets.

Contribution

The paper proposes a new attention back-end model employing scaled-dot and feed-forward self-attention networks for better intra-relationship learning among enrollment utterances in speaker verification.

Findings

01

Lower EER and minDCF scores on CNCeleb with multiple enrollments

02

Effective for both text-independent and text-dependent verification

03

Applicable even with a single enrollment utterance

Abstract

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities. To make better use of multiple enrollment utterances, we propose a novel attention back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we conduct a series of experiments on CNCeleb and VoxCeleb datasets by combining it with several sate-of-the-art speaker encoders including TDNN and ResNet. Experimental results using multiple enrollment utterances on CNCeleb show that the proposed attention back-end model leads to lower EER and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nii-yamagishilab/Attention_Backend_for_ASV
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsResidual Connection · 1x1 Convolution · Average Pooling · Residual Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Max Pooling · Convolution · Kaiming Initialization