TL;DR
This paper presents Story Forest, an online system for organizing massive streams of breaking news into evolving story structures, enabling accurate event detection and coherent story evolution in real-time.
Contribution
It introduces a novel online clustering and event linking approach that constructs dynamic story trees from streaming news data, addressing real-world challenges of redundancy and timely updates.
Findings
Outperforms existing algorithms in event detection accuracy.
Effectively organizes news into coherent, evolving story trees.
Validated on 60 GB of Chinese news data.
Abstract
We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we 1) need to accurately and quickly extract distinguishable events from massive streams of long text documents that cover diverse topics and contain highly redundant information, and 2) must develop the structures of event stories in an online manner, without repeatedly restructuring previously formed stories, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
