Multi-Channel Speaker Verification for Single and Multi-talker Speech

Saurabh Kataria; Shi-Xiong Zhang; Dong Yu

arXiv:2010.12692·eess.AS·April 12, 2021·Interspeech

Multi-Channel Speaker Verification for Single and Multi-talker Speech

Saurabh Kataria, Shi-Xiong Zhang, Dong Yu

PDF

Open Access

TL;DR

This paper enhances speaker verification in challenging environments by integrating multi-channel speech features, speech enhancement, and contrastive learning, resulting in significant error rate reductions especially in multi-talker scenarios.

Contribution

It introduces a multi-channel feature fusion framework combined with supervised learning techniques and contrastive fine-tuning for improved speaker verification performance.

Findings

01

36% relative EER reduction on real multi-talker recordings

02

Consistent improvements with speaker-dependent directional features

03

Additional 8.3% relative EER reduction with contrastive loss fine-tuning

Abstract

To improve speaker verification in real scenarios with interference speakers, noise, and reverberation, we propose to bring together advancements made in multi-channel speech features. Specifically, we combine spectral, spatial, and directional features, which includes inter-channel phase difference, multi-channel sinc convolutions, directional power ratio features, and angle features. To maximally leverage supervised learning, our framework is also equipped with multi-channel speech enhancement and voice activity detection. On all simulated, replayed, and real recordings, we observe large and consistent improvements at various degradation levels. On real recordings of multi-talker speech, we achieve a 36% relative reduction in equal error rate w.r.t. single-channel baseline. We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing