Event-based evaluation of abstractive news summarization
Huiling You, Samia Touileb, Erik Velldal, Lilja {\O}vrelid

TL;DR
This paper introduces an event-based evaluation method for abstractive news summaries, comparing event overlaps among generated summaries, references, and original articles to better assess content quality.
Contribution
It proposes a novel event overlap-based evaluation metric for summarization, utilizing richly annotated Norwegian datasets to improve insight into event content in summaries.
Findings
Event overlap correlates with summary quality.
The method offers more detailed content analysis.
Enhanced evaluation insights for abstractive summarization.
Abstract
An abstractive summary of a news article contains its most important information in a condensed version. The evaluation of automatically generated summaries by generative language models relies heavily on human-authored summaries as gold references, by calculating overlapping units or similarity scores. News articles report events, and ideally so should the summaries. In this work, we propose to evaluate the quality of abstractive summaries by calculating overlapping events between generated summaries, reference summaries, and the original news articles. We experiment on a richly annotated Norwegian dataset comprising both events annotations and summaries authored by expert human annotators. Our approach provides more insight into the event information contained in the summaries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Text and Document Classification Technologies
