ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization

Jiaan Wang; Fandong Meng; Ziyao Lu; Duo Zheng; Zhixu Li; Jianfeng Qu,; Jie Zhou

arXiv:2202.05599·cs.CL·October 18, 2022·1 cites

ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization

Jiaan Wang, Fandong Meng, Ziyao Lu, Duo Zheng, Zhixu Li, Jianfeng Qu,, Jie Zhou

PDF

Open Access 2 Repos 2 Models

TL;DR

ClidSum introduces a large, multilingual dialogue summarization dataset and benchmarks various models, including a new pre-trained model mDialBART, to advance research in cross-lingual dialogue summarization.

Contribution

The paper provides a new benchmark dataset, two benchmark settings, baseline systems, and a novel pre-trained model mDialBART for cross-lingual dialogue summarization.

Findings

01

mDialBART outperforms pipeline models on ClidSum

02

Extensive experiments and analyses conducted

03

Challenges and future directions discussed

Abstract

We present ClidSum, a benchmark dataset for building cross-lingual summarization systems on dialogue documents. It consists of 67k+ dialogue documents from two subsets (i.e., SAMSum and MediaSum) and 112k+ annotated summaries in different target languages. Based on the proposed ClidSum, we introduce two benchmark settings for supervised and semi-supervised scenarios, respectively. We then build various baseline systems in different paradigms (pipeline and end-to-end) and conduct extensive experiments on ClidSum to provide deeper analyses. Furthermore, we propose mDialBART which extends mBART-50 (a multi-lingual BART) via further pre-training. The multiple objectives used in the further pre-training stage help the pre-trained model capture the structural characteristics as well as important content in dialogues and the transformation from source to the target language. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification