Meta-learning with Latent Space Clustering in Generative Adversarial   Network for Speaker Diarization

Monisankha Pal; Manoj Kumar; Raghuveer Peri; Tae Jin Park; So Hyun; Kim; Catherine Lord; Somer Bishop; and Shrikanth Narayanan

arXiv:2007.09635·eess.AS·July 21, 2020

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun, Kim, Catherine Lord, Somer Bishop, and Shrikanth Narayanan

PDF

TL;DR

This paper introduces a meta-learning enhanced GAN-based approach for speaker diarization that improves robustness and domain adaptation, achieving significant error rate reductions across diverse multi-domain datasets.

Contribution

The work extends ClusterGAN with meta-learning to create MCGAN, enabling rapid adaptation and improved robustness in speaker diarization across challenging environments.

Findings

01

MCGAN embeddings outperform x-vectors in diarization accuracy.

02

The proposed method achieves up to 53.93% relative DER reduction.

03

MCGAN improves speaker count estimation and short segment diarization.

Abstract

The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN network to improve diarization robustness and enable rapid generalization across various challenging domains. To this end, we fetch the pre-trained encoder from the ClusterGAN and fine-tune it by using prototypical loss (meta-ClusterGAN or MCGAN) under the meta-learning paradigm. Experiments are conducted on CALLHOME telephonic conversations, AMI meeting data, DIHARD II (dev set) which includes challenging multi-domain corpus, and two child-clinician interaction corpora (ADOS, BOSCC) related to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.