Domain Agnostic Few-shot Learning for Speaker Verification
Seunghan Yang, Debasmit Das, Janghoon Cho, Hyoungwoo Park, Sungrack, Yun

TL;DR
This paper introduces a domain-agnostic few-shot learning framework for speaker verification that improves generalization to new users and environments by using domain-specific and aggregated networks, with clustering to optimize memory usage.
Contribution
It presents a novel few-shot domain generalization approach with domain-specific and aggregation networks, enhancing speaker verification performance across diverse domains.
Findings
Improved generalization to new domains and users.
Effective clustering reduces memory requirements.
Enhanced performance on standard benchmarks.
Abstract
Deep learning models for verification systems often fail to generalize to new users and new environments, even though they learn highly discriminative features. To address this problem, we propose a few-shot domain generalization framework that learns to tackle distribution shift for new users and new domains. Our framework consists of domain-specific and domain-aggregation networks, which are the experts on specific and combined domains, respectively. By using these networks, we generate episodes that mimic the presence of both novel users and novel domains in the training phase to eventually produce better generalization. To save memory, we reduce the number of domain-specific networks by clustering similar domains together. Upon extensive evaluation on artificially generated noise domains, we can explicitly show generalization ability of our framework. In addition, we apply our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
