EML System Description for VoxCeleb Speaker Diarization Challenge 2020

Omid Ghahabi; Volker Fischer

arXiv:2010.12497·cs.SD·October 26, 2020

EML System Description for VoxCeleb Speaker Diarization Challenge 2020

Omid Ghahabi, Volker Fischer

PDF

Open Access

TL;DR

This paper presents an online speaker diarization system for VoxCeleb that achieves high accuracy and real-time processing, outperforming offline baselines on the VoxConverse dataset.

Contribution

Introduction of an online speaker diarization algorithm that operates in near real-time and demonstrates superior accuracy on VoxCeleb datasets.

Findings

01

Achieved better DER and JER than offline baseline.

02

Operates in approximately 1.2 seconds per decision.

03

Real-time factor of about 0.01 on a single CPU.

Abstract

This technical report describes the EML submission to the first VoxCeleb speaker diarization challenge. Although the aim of the challenge has been the offline processing of the signals, the submitted system is basically the EML online algorithm which decides about the speaker labels in runtime approximately every 1.2 sec. For the first phase of the challenge, only VoxCeleb2 dev dataset was used for training. The results on the provided VoxConverse dev set show much better accuracy in terms of both DER and JER compared to the offline baseline provided in the challenge. The real-time factor of the whole diarization process is about 0.01 using a single CPU machine.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing