Time-domain speech super-resolution with GAN based modeling for telephony speaker verification
Saurabh Kataria, Jes\'us Villalba, Laureano Moro-Vel\'azquez, Piotr, \.Zelasko, Najim Dehak

TL;DR
This paper explores time-domain GAN-based models for bandwidth extension to improve telephony speaker verification, demonstrating significant performance gains and analyzing various quality and embedding effects.
Contribution
It introduces time-domain GAN models for bandwidth extension in speaker verification, extending prior feature-domain approaches with comprehensive experiments and new test-time schemes.
Findings
GAN-based bandwidth extension improves speaker verification accuracy.
Time-domain models outperform feature-domain methods.
Bandwidth extension shifts embeddings towards wideband signals.
Abstract
Automatic Speaker Verification (ASV) technology has become commonplace in virtual assistants. However, its performance suffers when there is a mismatch between the train and test domains. Mixed bandwidth training, i.e., pooling training data from both domains, is a preferred choice for developing a universal model that works for both narrowband and wideband domains. We propose complementing this technique by performing neural upsampling of narrowband signals, also known as bandwidth extension. Our main goal is to discover and analyze high-performing time-domain Generative Adversarial Network (GAN) based models to improve our downstream state-of-the-art ASV system. We choose GANs since they (1) are powerful for learning conditional distribution and (2) allow flexible plug-in usage as a pre-processor during the training of downstream task (ASV) with data augmentation. Prior works mainly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
MethodsTest
