TCG CREST System Description for the DISPLACE-M Challenge
Nikhil Raghav, Md Sahidullah

TL;DR
This paper describes the TCG CREST system for speaker diarization in noisy medical conversations, comparing different VAD and clustering methods, and achieving significant error rate improvements in the DISPLACE-M challenge.
Contribution
It introduces a hybrid neural diarization system and novel spectral clustering variants, demonstrating improved performance over baseline methods in challenging real-world scenarios.
Findings
Diarizen system improved DER by approximately 39% over baseline.
Best system achieved DER of 10.37% on development set.
Team ranked fifth out of eleven in the challenge.
Abstract
This report presents the TCG CREST system description for Track 1 (Speaker Diarization) of the DISPLACE-M challenge, focusing on naturalistic medical conversations in noisy rural-healthcare scenarios. Our study evaluates the impact of various voice activity detection (VAD) methods and advanced clustering algorithms on overall speaker diarization (SD) performance. We compare and analyze two SD frameworks: a modular pipeline utilizing SpeechBrain with ECAPA-TDNN embeddings, and a state-of-the-art (SOTA) hybrid end-to-end neural diarization system, Diarizen, built on top of a pre-trained WavLM. With these frameworks, we explore diverse clustering techniques, including agglomerative hierarchical clustering (AHC), and multiple novel variants of spectral clustering, such as SC-adapt, SC-PNA, and SC-MK. Experimental results demonstrate that the Diarizen system provides an approximate …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Radiology practices and education
