System Description for the Displace Speaker Diarization Challenge 2023

Ali Aliyev

arXiv:2406.15516·cs.CL·June 25, 2024

System Description for the Displace Speaker Diarization Challenge 2023

Ali Aliyev

PDF

Open Access

TL;DR

This paper presents a speaker diarization system for Displace 2023, combining VAD, CNN feature extraction, and spectral clustering, achieving competitive DER metrics without language-specific training.

Contribution

The paper introduces a speaker diarization approach that does not require language-specific training, utilizing a CNN-based feature extractor and spectral clustering.

Findings

01

DER of 27.1% on development data

02

DER of 27.4% on evaluation data

03

Effective without Hindi language training

Abstract

This paper describes our solution for the Diarization of Speaker and Language in Conversational Environments Challenge (Displace 2023). We used a combination of VAD for finding segfments with speech, Resnet architecture based CNN for feature extraction from these segments, and spectral clustering for features clustering. Even though it was not trained with using Hindi, the described algorithm achieves the following metrics: DER 27. 1% and DER 27. 4%, on the development and phase-1 evaluation parts of the dataset, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques

MethodsAverage Pooling · Max Pooling · Spectral Clustering · Global Average Pooling · Kaiming Initialization · Convolution