System Description for the Displace Speaker Diarization Challenge 2023
Ali Aliyev

TL;DR
This paper presents a speaker diarization system for Displace 2023, combining VAD, CNN feature extraction, and spectral clustering, achieving competitive DER metrics without language-specific training.
Contribution
The paper introduces a speaker diarization approach that does not require language-specific training, utilizing a CNN-based feature extractor and spectral clustering.
Findings
DER of 27.1% on development data
DER of 27.4% on evaluation data
Effective without Hindi language training
Abstract
This paper describes our solution for the Diarization of Speaker and Language in Conversational Environments Challenge (Displace 2023). We used a combination of VAD for finding segfments with speech, Resnet architecture based CNN for feature extraction from these segments, and spectral clustering for features clustering. Even though it was not trained with using Hindi, the described algorithm achieves the following metrics: DER 27. 1% and DER 27. 4%, on the development and phase-1 evaluation parts of the dataset, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
MethodsAverage Pooling · Max Pooling · Spectral Clustering · Global Average Pooling · Kaiming Initialization · Convolution
