DialogSum Challenge: Results of the Dialogue Summarization Shared Task

Yulong Chen; Naihao Deng; Yang Liu; Yue Zhang

arXiv:2208.03898·cs.CL·September 7, 2022·1 cites

DialogSum Challenge: Results of the Dialogue Summarization Shared Task

Yulong Chen, Naihao Deng, Yang Liu, Yue Zhang

PDF

Open Access 1 Repo

TL;DR

The DialogSum Challenge evaluated various dialogue summarization methods, revealing significant improvements over baselines but highlighting persistent gaps between automated metrics and human judgment, underscoring the complexity of the task.

Contribution

This paper presents the results of a shared task on dialogue summarization, comparing different approaches and emphasizing the need for better evaluation metrics.

Findings

01

Models outperform baselines on Rouge scores

02

Significant gap between automated metrics and human evaluation

03

Dialogue summarization remains a challenging task

Abstract

We report the results of DialogSum Challenge, the shared task on summarizing real-life scenario dialogues at INLG 2022. Four teams participate in this shared task and three submit their system reports, exploring different methods to improve the performance of dialogue summarization. Although there is a great improvement over the baseline models regarding automatic evaluation metrics, such as Rouge scores, we find that there is a salient gap between model generated outputs and human annotated summaries by human evaluation from multiple aspects. These findings demonstrate the difficulty of dialogue summarization and suggest that more fine-grained evaluatuion metrics are in need.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cylnlp/DialogSum
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems