Time-domain speech super-resolution with GAN based modeling for   telephony speaker verification

Saurabh Kataria; Jes\'us Villalba; Laureano Moro-Vel\'azquez; Piotr; \.Zelasko; Najim Dehak

arXiv:2209.01702·eess.AS·September 7, 2022·IEEE ACM Trans. Audio Speech Lang. Process.·1 cites

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

Saurabh Kataria, Jes\'us Villalba, Laureano Moro-Vel\'azquez, Piotr, \.Zelasko, Najim Dehak

PDF

Open Access

TL;DR

This paper explores time-domain GAN-based models for bandwidth extension to improve telephony speaker verification, demonstrating significant performance gains and analyzing various quality and embedding effects.

Contribution

It introduces time-domain GAN models for bandwidth extension in speaker verification, extending prior feature-domain approaches with comprehensive experiments and new test-time schemes.

Findings

01

GAN-based bandwidth extension improves speaker verification accuracy.

02

Time-domain models outperform feature-domain methods.

03

Bandwidth extension shifts embeddings towards wideband signals.

Abstract

Automatic Speaker Verification (ASV) technology has become commonplace in virtual assistants. However, its performance suffers when there is a mismatch between the train and test domains. Mixed bandwidth training, i.e., pooling training data from both domains, is a preferred choice for developing a universal model that works for both narrowband and wideband domains. We propose complementing this technique by performing neural upsampling of narrowband signals, also known as bandwidth extension. Our main goal is to discover and analyze high-performing time-domain Generative Adversarial Network (GAN) based models to improve our downstream state-of-the-art ASV system. We choose GANs since they (1) are powerful for learning conditional distribution and (2) allow flexible plug-in usage as a pre-processor during the training of downstream task (ASV) with data augmentation. Prior works mainly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques

MethodsTest