VAE-based Domain Adaptation for Speaker Verification
Xueyi Wang, Lantian Li, Dong Wang

TL;DR
This paper introduces a VAE-based domain adaptation method for speaker verification that effectively transforms speaker embeddings to match target domain conditions, improving performance in mismatched scenarios.
Contribution
The paper proposes a novel VAE-based approach for domain adaptation of speaker embeddings, enabling effective transformation with minimal target domain data.
Findings
VAE adaptation improves speaker verification accuracy in mismatched domains.
Transforming x-vectors into a regularized latent space facilitates domain adaptation.
The method requires only a small amount of target domain data for effective adaptation.
Abstract
Deep speaker embedding has achieved satisfactory performance in speaker verification. By enforcing the neural model to discriminate the speakers in the training set, deep speaker embedding (called `x-vectors`) can be derived from the hidden layers. Despite its good performance, the present embedding model is highly domain sensitive, which means that it often works well in domains whose acoustic condition matches that of the training data (in-domain), but degrades in mismatched domains (out-of-domain). In this paper, we present a domain adaptation approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a regularized latent space; within this latent space, a small amount of data from the target domain is sufficient to accomplish the adaptation. Our experiments demonstrated that by this VAE-adaptation approach, speaker embeddings can be easily transformed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
