The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description
Yannis Tevissen (ARMEDIA-SAMOVAR), J\'er\^ome Boudy (ARMEDIA-SAMOVAR),, Fr\'ed\'eric Petitpont

TL;DR
This paper presents a novel multi-stream voice activity detection method combined with standard diarization techniques, achieving near state-of-the-art speaker diarization results in the VoxCeleb Challenge 2022.
Contribution
It introduces a multi-stream voice activity detection approach with a classifier entropy-based decision protocol, enhancing diarization performance.
Findings
Achieved near state-of-the-art results in speaker diarization.
Demonstrated effectiveness of combining multiple VAD algorithms.
Showed that strong baseline methods can yield competitive results.
Abstract
We describe the system used by our team for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC 2022) in the speaker diarization track. Our solution was designed around a new combination of voice activity detection algorithms that uses the strengths of several systems. We introduce a novel multi stream approach with a decision protocol based on classifiers entropy. We called this method a multi-stream voice activity detection and used it with standard baseline diarization embeddings, clustering and resegmentation. With this work, we successfully demonstrated that using a strong baseline and working only on voice activity detection, one can achieved close to state-of-theart results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
