TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li, Zhang, Qing Wang, Lei Xie

TL;DR
This paper presents the TSUP speaker diarization system for the ISCSLP 2022 CSSD challenge, comparing spectral clustering, TS-VAD, and EEND approaches, with spectral clustering performing best under the new CDER metric.
Contribution
The paper introduces a comprehensive evaluation of three diarization methods on short-phrase conversations using a new metric, highlighting the effectiveness of spectral clustering and the impact of hyperparameter tuning.
Findings
Spectral clustering outperforms other methods under CDER.
Hyperparameter tuning significantly improves diarization accuracy.
Multi-system fusion with DOVER-LAP degrades performance.
Abstract
This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering(SC) based diarization, target-speaker voice activity detection(TS-VAD) and end-to-end neural diarization(EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
