The Conversational Short-phrase Speaker Diarization (CSSD) Task:   Dataset, Evaluation Metric and Baselines

Gaofeng Cheng; Yifan Chen; Runyan Yang; Qingxuan Li; Zehui Yang,; Lingxuan Ye; Pengyuan Zhang; Qingqing Zhang; Lei Xie; Yanmin Qian; Kong Aik; Lee; Yonghong Yan

arXiv:2208.08042·cs.CL·August 18, 2022

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang,, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik, Lee, Yonghong Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces the CSSD task, including a new dataset, an evaluation metric focused on conversational short phrases, and baseline methods for speaker diarization in casual conversations.

Contribution

It presents a new conversational speaker diarization task with a dedicated dataset, a novel evaluation metric (CDER), and baseline system implementation.

Findings

01

Created a 20-hour conversational speech test dataset with verified speaker timestamps.

02

Designed the conversational DER (CDER) metric to evaluate utterance-level accuracy.

03

Established baseline results using Variational Bayes HMM x-vector system.

Abstract

The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of "who speak when" as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

speechclub/cder_metric
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling

MethodsTest