The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker   Diarisation Challenge

Renyu Wang; Ruilin Tong; Yu Ting Yeung; Xiao Chen

arXiv:2010.11657·cs.SD·October 26, 2020·1 cites

The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

Renyu Wang, Ruilin Tong, Yu Ting Yeung, Xiao Chen

PDF

Open Access

TL;DR

This paper presents a speaker diarisation system for VoxCeleb Challenge 2020 that integrates neural network-based speech enhancement and VAD, combined with clustering techniques, achieving significant error rate improvements.

Contribution

The system introduces neural network-based VAD and speech enhancement, improving diarisation accuracy over traditional methods.

Findings

01

Achieved DER of 10.45% on evaluation set

02

Implemented neural network-based VAD for better segmentation

03

Enhanced clustering with AHC and VB-HMM

Abstract

This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020. Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals. We replace conventional energy-based voice activity detection (VAD) with a neural network based VAD. The neural network based VAD provides more accurate annotation of speech segments containing only background music, noise, and other interference, which is crucial to diarisation performance. We apply agglomerative hierarchical clustering (AHC) of x-vectors and variational Bayesian hidden Markov model (VB-HMM) based iterative clustering for speaker clustering. Experimental results demonstrate that our proposed system achieves substantial improvements over the baseline system, yielding diarisation error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing