On the State of German (Abstractive) Text Summarization

Dennis Aumiller; Jing Fan; Michael Gertz

arXiv:2301.07095·cs.CL·January 18, 2023

On the State of German (Abstractive) Text Summarization

Dennis Aumiller, Jing Fan, Michael Gertz

PDF

Open Access 1 Repo 5 Datasets

TL;DR

This paper evaluates the current state of German abstractive text summarization, highlighting dataset flaws, evaluation biases, and the underperformance of existing systems compared to simple baselines.

Contribution

It provides a comprehensive analysis of German summarization datasets and systems, identifying key issues and offering tools for improved evaluation and dataset filtering.

Findings

01

Over 50% of MLSUM dataset unsuitable for abstractive summarization

02

Existing systems often underperform simple extractive baselines

03

Evaluation scores can drop by over 20 ROUGE-1 points after dataset cleaning

Abstract

With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and investigate the reasons why practically useful solutions for abstractive text summarization are still absent in industry. Our focus is two-fold, analyzing a) training resources, and b) publicly available summarization systems. We are able to show that popular existing datasets exhibit crucial flaws in their assumptions about the original sources, which frequently leads to detrimental effects on system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dennlinger/summaries
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

Methodsfail