TCG CREST System Description for the Second DISPLACE Challenge

Nikhil Raghav; Subhajit Saha; Md Sahidullah; Swagatam Das

arXiv:2409.15356·eess.AS·September 25, 2024

TCG CREST System Description for the Second DISPLACE Challenge

Nikhil Raghav, Subhajit Saha, Md Sahidullah, Swagatam Das

PDF

Open Access

TL;DR

This paper details the development of speaker and language diarization systems for the DISPLACE Challenge 2024, utilizing speech enhancement, VAD, neural embeddings, and fusion techniques, with spectral clustering, achieving notable improvements in speaker diarization.

Contribution

The paper introduces a comprehensive diarization system combining multiple speech processing techniques and fusion strategies, implemented with SpeechBrain, for multilingual and multi-speaker scenarios.

Findings

01

7% relative improvement in speaker diarization over baseline

02

No improvement in language diarization over baseline

03

Effective use of spectral clustering and embedding fusion

Abstract

In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024. Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios. We investigated different speech enhancement techniques, voice activity detection (VAD) techniques, unsupervised domain categorization, and neural embedding extraction architectures. We also exploited the fusion of various embedding extraction models. We implemented our system with the open-source SpeechBrain toolkit. Our final submissions use spectral clustering for both the speaker and language diarization. We achieve about $7%$ relative improvement over the challenge baseline in Track 1. We did not obtain improvement over the challenge baseline in Track 2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications

MethodsSpectral Clustering