How to Leverage DNN-based speech enhancement for multi-channel speaker   verification?

Sandipana Dowerah (MULTISPEECH); Romain Serizel (MULTISPEECH); Denis; Jouvet (MULTISPEECH); Mohammad Mohammadamini (LIA); Driss Matrouf (LIA)

arXiv:2210.08834·cs.SD·October 18, 2022

How to Leverage DNN-based speech enhancement for multi-channel speaker verification?

Sandipana Dowerah (MULTISPEECH), Romain Serizel (MULTISPEECH), Denis, Jouvet (MULTISPEECH), Mohammad Mohammadamini (LIA), Driss Matrouf (LIA)

PDF

Open Access

TL;DR

This paper benchmarks multichannel speech enhancement techniques, combining deep neural networks and signal processing, to improve far-field speaker verification performance under noisy and reverberant conditions.

Contribution

It introduces a comprehensive benchmark for DNN-based and hybrid speech enhancement methods tailored for far-field speaker verification, emphasizing the role of enrollment pre-processing.

Findings

01

Pre-processing enrollment data enhances SV performance.

02

Similar SNR conditions for enrollment and test data are crucial.

03

Significant improvements observed on VOiCES dataset across noise conditions.

Abstract

Speaker verification (SV) suffers from unsatisfactory performance in far-field scenarios due to environmental noise andthe adverse impact of room reverberation. This work presents a benchmark of multichannel speech enhancement for far-fieldspeaker verification. One approach is a deep neural network-based, and the other is a combination of deep neural network andsignal processing. We integrated a DNN architecture with signal processing techniques to carry out various experiments. Ourapproach is compared to the existing state-of-the-art approaches. We examine the importance of enrollment in pre-processing,which has been largely overlooked in previous studies. Experimental evaluation shows that pre-processing can improve the SVperformance as long as the enrollment files are processed similarly to the test data and that test and enrollment occur within similarSNR ranges. Considerable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies