TCG CREST System Description for the Second DISPLACE Challenge
Nikhil Raghav, Subhajit Saha, Md Sahidullah, Swagatam Das

TL;DR
This paper details the development of speaker and language diarization systems for the DISPLACE Challenge 2024, utilizing speech enhancement, VAD, neural embeddings, and fusion techniques, with spectral clustering, achieving notable improvements in speaker diarization.
Contribution
The paper introduces a comprehensive diarization system combining multiple speech processing techniques and fusion strategies, implemented with SpeechBrain, for multilingual and multi-speaker scenarios.
Findings
7% relative improvement in speaker diarization over baseline
No improvement in language diarization over baseline
Effective use of spectral clustering and embedding fusion
Abstract
In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024. Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios. We investigated different speech enhancement techniques, voice activity detection (VAD) techniques, unsupervised domain categorization, and neural embedding extraction architectures. We also exploited the fusion of various embedding extraction models. We implemented our system with the open-source SpeechBrain toolkit. Our final submissions use spectral clustering for both the speaker and language diarization. We achieve about relative improvement over the challenge baseline in Track 1. We did not obtain improvement over the challenge baseline in Track 2.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications
MethodsSpectral Clustering
