Attention-based multi-channel speaker verification with ad-hoc   microphone arrays

Chengdong Liang; Junqi Chen; Shanzheng Guan; Xiao-Lei Zhang

arXiv:2107.00178·cs.SD·July 2, 2021·6 cites

Attention-based multi-channel speaker verification with ad-hoc microphone arrays

Chengdong Liang, Junqi Chen, Shanzheng Guan, Xiao-Lei Zhang

PDF

Open Access

TL;DR

This paper introduces an attention-based multi-channel speaker verification system designed for ad-hoc microphone arrays with unknown configurations, employing residual self-attention and sparsemax to improve robustness and accuracy.

Contribution

It proposes a novel neural network architecture with inter-channel processing and global fusion layers, incorporating sparsemax for better noise handling in ad-hoc microphone arrays.

Findings

01

Achieves over 20% EER reduction on semi-real data

02

Achieves over 30% EER reduction on simulated data

03

Effective in scenarios with varying and mismatched channel numbers

Abstract

Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel speaker verification with ad-hoc microphone arrays. Specifically, we add an inter-channel processing layer and a global fusion layer after the pooling layer of a single-channel speaker verification system. The inter-channel processing layer applies a so-called residual self-attention along the channel dimension for allocating weights to different microphones. The global fusion layer integrates all channels in a way that is independent to the number of the input channels. We further replace the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing