Understanding news story chains using information retrieval and network clustering techniques
Tom Nicholls, Jonathan Bright

TL;DR
This paper introduces an automated method combining information retrieval and network clustering techniques to identify and analyze news story chains within large article corpora, revealing that over half of news production occurs within stories.
Contribution
The paper presents a novel, automated approach for detecting linked news stories using textual similarity and network clustering, enabling large-scale analysis of news event structures.
Findings
Over 50% of news articles are part of stories.
The method efficiently identifies valid story clusters.
Application to 61,864 articles demonstrates scalability.
Abstract
Content analysis of news stories (whether manual or automatic) is a cornerstone of the communication studies field. However, much research is conducted at the level of individual news articles, despite the fact that news events (especially significant ones) are frequently presented as "stories" by news outlets: chains of connected articles covering the same event from different angles. These stories are theoretically highly important in terms of increasing public recall of news items and enhancing the agenda-setting power of the press. Yet thus far, the field has lacked an efficient method for detecting groups of articles which form stories in a way that enables their analysis. In this work, we present a novel, automated method for identifying linked news stories from within a corpus of articles. This method makes use of techniques drawn from the field of information retrieval to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
