The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang,, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik, Lee, Yonghong Yan

TL;DR
This paper introduces the CSSD task, including a new dataset, an evaluation metric focused on conversational short phrases, and baseline methods for speaker diarization in casual conversations.
Contribution
It presents a new conversational speaker diarization task with a dedicated dataset, a novel evaluation metric (CDER), and baseline system implementation.
Findings
Created a 20-hour conversational speech test dataset with verified speaker timestamps.
Designed the conversational DER (CDER) metric to evaluate utterance-level accuracy.
Established baseline results using Variational Bayes HMM x-vector system.
Abstract
The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of "who speak when" as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
MethodsTest
