DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions   of Annotators' Labels

Zhaohao Zeng; Tetsuya Sakai

arXiv:2104.08755·cs.CL·June 1, 2021

DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels

Zhaohao Zeng, Tetsuya Sakai

PDF

Open Access

TL;DR

DCH-2 is a comprehensive bilingual customer-helpdesk dialogue dataset with detailed annotations, designed to support research in dialogue systems, machine translation, and understanding effective customer support interactions.

Contribution

The paper introduces DCH-2, a large annotated dialogue corpus in Chinese and English, created for advancing research in dialogue systems and machine translation in the helpdesk domain.

Findings

01

Provides a new dataset with 4,390 dialogues and annotations

02

Enables research on dialogue effectiveness and retrieval systems

03

Supports machine translation in customer service context

Abstract

We introduce a data set called DCH-2, which contains 4,390 real customer-helpdesk dialogues in Chinese and their English translations. DCH-2 also contains dialogue-level annotations and turn-level annotations obtained independently from either 19 or 20 annotators. The data set was built through our effort as organisers of the NTCIR-14 Short Text Conversation and NTCIR-15 Dialogue Evaluation tasks, to help researchers understand what constitutes an effective customer-helpdesk dialogue, and thereby build efficient and helpful helpdesk systems that are available to customers at all times. In addition, DCH-2 may be utilised for other purposes, for example, as a repository for retrieval-based dialogue systems, or as a parallel corpus for machine translation in the helpdesk domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques