SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
Bogdan Gliwa, Iwona Mochol, Maciej Biesek, Aleksander Wawer

TL;DR
The paper introduces the SAMSum Corpus, a high-quality dataset of human-annotated dialogue summaries, highlighting the unique challenges of abstractive dialogue summarization and the need for specialized models and evaluation methods.
Contribution
It presents the first high-quality, manually annotated dialogue dataset for abstractive summarization and analyzes the limitations of current models and evaluation metrics on this task.
Findings
Model summaries achieve higher ROUGE scores on dialogues than on news articles.
Human judgment disagrees with ROUGE scores, indicating the need for better evaluation metrics.
Dialogue summarization remains a challenging task requiring dedicated models.
Abstract
This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
