CLTS+: A New Chinese Long Text Summarization Dataset with Abstractive Summaries
Xiaojun Liu, Shunan Zang, Chuang Zhang, Xiaojun Chen, Yangyang Ding

TL;DR
This paper introduces CLTS+, a large Chinese long text summarization dataset with highly abstractive summaries, aiming to enhance the creative ability of summarization models and providing a new benchmark for Chinese NLP research.
Contribution
The creation of CLTS+, the first Chinese long text summarization dataset with high abstractiveness, along with an intrinsic co-occurrence metric for evaluation and baseline experiments demonstrating its utility.
Findings
CLTS+ contains over 180K article-summary pairs.
The dataset exhibits high levels of abstractiveness and difficulty.
Baseline models trained on CLTS+ show improved creative summarization abilities.
Abstract
The abstractive methods lack of creative ability is particularly a problem in automatic text summarization. The summaries generated by models are mostly extracted from the source articles. One of the main causes for this problem is the lack of dataset with abstractiveness, especially for Chinese. In order to solve this problem, we paraphrase the reference summaries in CLTS, the Chinese Long Text Summarization dataset, correct errors of factual inconsistencies, and propose the first Chinese Long Text Summarization dataset with a high level of abstractiveness, CLTS+, which contains more than 180K article-summary pairs and is available online. Additionally, we introduce an intrinsic metric based on co-occurrence words to evaluate the dataset we constructed. We analyze the extraction strategies used in CLTS+ summaries against other datasets to quantify the abstractiveness and difficulty of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
