NEWTS: A Corpus for News Topic-Focused Summarization
Seyed Ali Bahrainian, Sheridan Feucht, Carsten Eickhoff

TL;DR
NEWTS is a novel dataset designed for news topic-focused summarization, enabling evaluation of models conditioned on specific themes, which addresses a gap in existing summarization benchmarks.
Contribution
The paper introduces NEWTS, the first corpus for topic-focused summarization, based on CNN/Dailymail, with annotations for different themes to facilitate targeted summarization research.
Findings
Existing models show varied effectiveness on topic-focused tasks.
Prompting methods significantly influence summarization quality.
NEWTS enables systematic evaluation of theme-conditioned summarization techniques.
Abstract
Text summarization models are approaching human levels of fidelity. Existing benchmarking corpora provide concordant pairs of full and abridged versions of Web, news or, professional content. To date, all summarization datasets operate under a one-size-fits-all paradigm that may not reflect the full range of organic summarization needs. Several recently proposed models (e.g., plug and play language models) have the capacity to condition the generated summaries on a desired range of themes. These capacities remain largely unused and unevaluated as there is no dedicated dataset that would support the task of topic-focused summarization. This paper introduces the first topical summarization corpus NEWTS, based on the well-known CNN/Dailymail dataset, and annotated via online crowd-sourcing. Each source article is paired with two reference summaries, each focusing on a different theme of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
