Multi-query multi-head attention pooling and Inter-topK penalty for   speaker verification

Miao Zhao; Yufeng Ma; Yiwei Ding; Yu Zheng; Min Liu; Minqiang Xu

arXiv:2110.05042·cs.SD·October 13, 2021

Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification

Miao Zhao, Yufeng Ma, Yiwei Ding, Yu Zheng, Min Liu, Minqiang Xu

PDF

Open Access

TL;DR

This paper introduces the MQMHA pooling and inter-topK penalty methods for speaker verification, combining diversified attention mechanisms and inter-class discriminability enhancements to achieve state-of-the-art results.

Contribution

The paper proposes a novel multi-query multi-head attention pooling and an inter-topK penalty to improve speaker verification performance.

Findings

01

Achieved state-of-the-art results on VoxCeleb test sets.

02

Demonstrated improved inter-class discriminability.

03

Enhanced speaker representation diversity.

Abstract

This paper describes the multi-query multi-head attention (MQMHA) pooling and inter-topK penalty methods which were first proposed in our submitted system description for VoxCeleb speaker recognition challenge (VoxSRC) 2021. Most multi-head attention pooling mechanisms either attend to the whole feature through multiple heads or attend to several split parts of the whole feature. Our proposed MQMHA combines both these two mechanisms and gain more diversified information. The margin-based softmax loss functions are commonly adopted to obtain discriminative speaker representations. To further enhance the inter-class discriminability, we propose a method that adds an extra inter-topK penalty on some confused speakers. By adopting both the MQMHA and inter-topK penalty, we achieved state-of-the-art performance in all of the public VoxCeleb test sets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing