Real-time News Story Identification
Tadej \v{S}kvorc, Nikola Iva\v{c}i\v{c}, Sebastjan Hribar, Marko Robnik-\v{S}ikonja

TL;DR
This paper introduces a real-time news story identification system that groups online news articles into specific stories based on events, places, and people, using a combination of text representations, clustering, and online topic modeling.
Contribution
It presents a novel real-time approach combining multiple text representation and online topic modeling techniques for accurate news story grouping.
Findings
The approach produces sensible, human-evaluable story groupings.
Combining various online topic models improves story identification accuracy.
The system effectively processes news articles in real time.
Abstract
To improve the reading experience, many news sites organize news into topical collections, called stories. In this work, we present an approach for implementing real-time story identification for a news monitoring system that automatically collects news articles as they appear online and processes them in various ways. Story identification aims to assign each news article to a specific story that the article is covering. The process is similar to text clustering and topic modeling, but requires that articles be grouped based on particular events, places, and people, rather than general text similarity (as in clustering) or general (predefined) topics (as in topic modeling). We present an approach to story identification that is capable of functioning in real time, assigning articles to stories as they are published online. In the proposed approach, we combine text representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Video Analysis and Summarization · Topic Modeling
