Joint domain adaptation and speech bandwidth extension using time-domain   GANs for speaker verification

Saurabh Kataria; Jes\'us Villalba; Laureano Moro-Vel\'azquez; Najim; Dehak

arXiv:2203.16614·eess.AS·April 1, 2022

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

Saurabh Kataria, Jes\'us Villalba, Laureano Moro-Vel\'azquez, Najim, Dehak

PDF

Open Access

TL;DR

This paper introduces a joint learning approach using time-domain GANs to simultaneously perform domain adaptation and bandwidth extension for speech, improving speaker verification accuracy across different acoustic conditions.

Contribution

It proposes a novel joint training framework for domain adaptation and bandwidth extension using GANs, with both parallel and non-parallel data, outperforming separate models.

Findings

01

22% relative EER improvement on SRE16

02

Effective use of both paired and unpaired data

03

First evidence that joint learning surpasses individual tasks

Abstract

Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrowband conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. First, we first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing