NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker   Verification Challenge

Li Zhang; Jian Wu; Lei Xie

arXiv:2008.03521·eess.AS·August 11, 2020·1 cites

NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge

Li Zhang, Jian Wu, Lei Xie

PDF

Open Access

TL;DR

This paper presents a speaker verification system for the INTERSPEECH 2020 challenge, introducing a new ResNet-BAM embedding architecture and various techniques to improve accuracy in far-field scenarios with short utterances and channel mismatch.

Contribution

The paper introduces ResNet-BAM, a novel speaker embedding architecture, and explores domain adversarial training, signal processing, and data augmentation to enhance far-field speaker verification.

Findings

01

ResNet-BAM reduces EER by up to 1%.

02

Domain adversarial training reduces EER by 0.8%.

03

Data augmentation with data selection reduces EER by 2%.

Abstract

This paper describes the NPU system submitted to Interspeech 2020 Far-Field Speaker Verification Challenge (FFSVC). We particularly focus on far-field text-dependent SV from single (task1) and multiple microphone arrays (task3). The major challenges in such scenarios are short utterance and cross-channel and distance mismatch for enrollment and test. With the belief that better speaker embedding can alleviate the effects from short utterance, we introduce a new speaker embedding architecture - ResNet-BAM, which integrates a bottleneck attention module with ResNet as a simple and efficient way to further improve the representation power of ResNet. This contribution brings up to 1% EER reduction. We further address the mismatch problem in three directions. First, domain adversarial training, which aims to learn domain-invariant features, can yield to 0.8% EER reduction. Second, front-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing