Discriminative Neural Clustering for Speaker Diarisation

Qiujia Li; Florian L. Kreyssig; Chao Zhang; and Philip C. Woodland

arXiv:1910.09703·eess.AS·November 24, 2020·23 cites

Discriminative Neural Clustering for Speaker Diarisation

Qiujia Li, Florian L. Kreyssig, Chao Zhang, and Philip C. Woodland

PDF

Open Access 1 Repo

TL;DR

This paper introduces Discriminative Neural Clustering, a supervised sequence-to-sequence approach using Transformer architecture for speaker diarisation, effectively reducing speaker error rates on the AMI dataset.

Contribution

It presents a novel supervised neural clustering method with data augmentation techniques, outperforming traditional spectral clustering in speaker diarisation.

Findings

01

DNC reduces speaker error rate by 29.4% relative to spectral clustering.

02

Data augmentation schemes improve training effectiveness on limited data.

03

Transformer-based DNC is effective for speaker diarisation tasks.

Abstract

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem. Compared to traditional unsupervised clustering algorithms, DNC learns clustering patterns from training data without requiring an explicit definition of a similarity measure. An implementation of DNC based on the Transformer architecture is shown to be effective on a speaker diarisation task using the challenging AMI dataset. Since AMI contains only 147 complete meetings as individual input sequences, data scarcity is a significant issue for training a Transformer model for DNC. Accordingly, this paper proposes three data augmentation schemes: sub-sequence randomisation, input vector randomisation, and Diaconis augmentation, which generates new data samples by rotating the entire input sequence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FlorianKrey/DNC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing