On the State of German (Abstractive) Text Summarization
Dennis Aumiller, Jing Fan, Michael Gertz

TL;DR
This paper evaluates the current state of German abstractive text summarization, highlighting dataset flaws, evaluation biases, and the underperformance of existing systems compared to simple baselines.
Contribution
It provides a comprehensive analysis of German summarization datasets and systems, identifying key issues and offering tools for improved evaluation and dataset filtering.
Findings
Over 50% of MLSUM dataset unsuitable for abstractive summarization
Existing systems often underperform simple extractive baselines
Evaluation scores can drop by over 20 ROUGE-1 points after dataset cleaning
Abstract
With recent advancements in the area of Natural Language Processing, the focus is slowly shifting from a purely English-centric view towards more language-specific solutions, including German. Especially practical for businesses to analyze their growing amount of textual data are text summarization systems, which transform long input documents into compressed and more digestible summary texts. In this work, we assess the particular landscape of German abstractive text summarization and investigate the reasons why practically useful solutions for abstractive text summarization are still absent in industry. Our focus is two-fold, analyzing a) training resources, and b) publicly available summarization systems. We are able to show that popular existing datasets exhibit crucial flaws in their assumptions about the original sources, which frequently leads to detrimental effects on system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
Methodsfail
