Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles
Samia Touileb, Vladislav Mikhailov, Marie Kroka, Lilja {\O}vrelid,, Erik Velldal

TL;DR
This paper introduces a high-quality Norwegian news summarization dataset with multiple human-authored summaries in two written variants, serving as a benchmark for evaluating and improving generative language models in Norwegian.
Contribution
It provides a novel, multilingual Norwegian news summarization dataset with multiple gold-standard summaries, and evaluates existing language models on this challenging benchmark.
Findings
Existing open LLMs struggle with Norwegian summarization.
The dataset offers a challenging benchmark for future model development.
Human summaries outperform model-generated summaries in quality.
Abstract
We introduce a dataset of high-quality human-authored summaries of news articles in Norwegian. The dataset is intended for benchmarking the abstractive summarisation capabilities of generative language models. Each document in the dataset is provided with three different candidate gold-standard summaries written by native Norwegian speakers, and all summaries are provided in both of the written variants of Norwegian -- Bokm{\aa}l and Nynorsk. The paper describes details on the data creation effort as well as an evaluation of existing open LLMs for Norwegian on the dataset. We also provide insights from a manual human evaluation, comparing human-authored to model-generated summaries. Our results indicate that the dataset provides a challenging LLM benchmark for Norwegian summarisation capabilities
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
