GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition   Challenge (VoxSRC) 2023

Dongkeon Park; Ji Won Kim; Kang Ryeol Kim; Do Hyun Lee; Hong Kook Kim

arXiv:2308.07788·eess.AS·August 28, 2023·2 cites

GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 2023

Dongkeon Park, Ji Won Kim, Kang Ryeol Kim, Do Hyun Lee, Hong Kook Kim

PDF

Open Access

TL;DR

This paper presents a speaker diarization system for VoxCeleb Challenge 2023, utilizing diverse models and ensemble techniques to achieve low diarization error rates on validation and test datasets.

Contribution

The system combines ResNet293 and MFA-Conformer models into an ensemble, demonstrating improved diarization performance for speaker recognition tasks.

Findings

01

Ensemble model achieved a DER of 3.50% on validation data.

02

ResNet293 and MFA-Conformer models had DERs of 3.65% and 3.83%.

03

Test set DER was 4.88%.

Abstract

This report describes the submission system by the GIST-AiTeR team for the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system focuses on implementing diverse speaker diarization (SD) techniques, including ResNet293 and MFA-Conformer with different combinations of segment and hop length. Then, those models are combined into an ensemble model. The ResNet293 and MFA-Conformer models exhibited the diarization error rates (DERs) of 3.65% and 3.83% on VAL46, respectively. The submitted ensemble model provided a DER of 3.50% on VAL46, and consequently, it achieved a DER of 4.88% on the VoxSRC-23 test set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing