Story Disambiguation: Tracking Evolving News Stories across News and Social Streams
Bichen Shi, Thanh-Binh Le, Neil Hurley, Georgiana Ifrim

TL;DR
This paper introduces a novel cross-domain story tracking framework called Story Disambiguation, which leverages real-time entity disambiguation and learning-to-rank techniques to effectively track evolving news stories across diverse sources and formats.
Contribution
It presents a new approach combining entity graphs and semi-supervised learning for real-time, accurate story tracking across multiple domains with less labeled data.
Findings
Outperforms state-of-the-art methods in accuracy for mixed-domain streams.
Requires fewer labeled data to seed stories.
Effective in tracking local and complex stories in noisy environments.
Abstract
Following a particular news story online is an important but difficult task, as the relevant information is often scattered across different domains/sources (e.g., news articles, blogs, comments, tweets), presented in various formats and language styles, and may overlap with thousands of other stories. In this work we join the areas of topic tracking and entity disambiguation, and propose a framework named Story Disambiguation - a cross-domain story tracking approach that builds on real-time entity disambiguation and a learning-to-rank framework to represent and update the rich semantic structure of news stories. Given a target news story, specified by a seed set of documents, the goal is to effectively select new story-relevant documents from an incoming document stream. We represent stories as entity graphs and we model the story tracking problem as a learning-to-rank task. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Text and Document Classification Technologies · Spam and Phishing Detection
