NewsEdits: A News Article Revision Dataset and a Document-Level Reasoning Challenge
Alexander Spangher, Xiang Ren, Jonathan May, Nanyun Peng

TL;DR
This paper introduces NewsEdits, a large multilingual dataset of news article revisions, and explores the predictability of edit actions to aid journalistic analysis and narrative understanding.
Contribution
It provides the first extensive, multilingual dataset of news revisions and develops algorithms to identify edit actions, enabling new research in news evolution and prediction.
Findings
Added and deleted sentences often contain updates and quotes.
Predicting edit actions is feasible for humans but challenging for NLP models.
The dataset spans 15 years and includes 1.2 million articles from multiple countries.
Abstract
News article revision histories provide clues to narrative and factual evolution in news articles. To facilitate analysis of this evolution, we present the first publicly available dataset of news revision histories, NewsEdits. Our dataset is large-scale and multilingual; it contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources based in three countries, spanning 15 years of coverage (2006-2021). We define article-level edit actions: Addition, Deletion, Edit and Refactor, and develop a high-accuracy extraction algorithm to identify these actions. To underscore the factual nature of many edit actions, we conduct analyses showing that added and deleted sentences are more likely to contain updating events, main content and quotes than unchanged sentences. Finally, to explore whether edit actions are predictable, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
