Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization
Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun, Kim, Catherine Lord, Somer Bishop, and Shrikanth Narayanan

TL;DR
This paper introduces a meta-learning enhanced GAN-based approach for speaker diarization that improves robustness and domain adaptation, achieving significant error rate reductions across diverse multi-domain datasets.
Contribution
The work extends ClusterGAN with meta-learning to create MCGAN, enabling rapid adaptation and improved robustness in speaker diarization across challenging environments.
Findings
MCGAN embeddings outperform x-vectors in diarization accuracy.
The proposed method achieves up to 53.93% relative DER reduction.
MCGAN improves speaker count estimation and short segment diarization.
Abstract
The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN network to improve diarization robustness and enable rapid generalization across various challenging domains. To this end, we fetch the pre-trained encoder from the ClusterGAN and fine-tune it by using prototypical loss (meta-ClusterGAN or MCGAN) under the meta-learning paradigm. Experiments are conducted on CALLHOME telephonic conversations, AMI meeting data, DIHARD II (dev set) which includes challenging multi-domain corpus, and two child-clinician interaction corpora (ADOS, BOSCC) related to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
