Online Event Integration with StoryPivot
Anja Gruenheid, Donald Kossmann, Divesh Srivastava

TL;DR
This paper introduces StoryPivot, an online event integration system that processes real-time news data to track the evolution of stories, balancing high-quality results with near real-time performance.
Contribution
It presents the design and evaluation of a novel online event integration system, demonstrating its effectiveness on real-world news datasets.
Findings
Effective real-time event integration achieved
Trade-offs between quality and speed analyzed
Generalizable insights for online data integration
Abstract
Modern data integration systems need to process large amounts of data from a variety of data sources and with real-time integration constraints. They are not only employed in enterprises for managing internal data but are also used for a variety of web services that use techniques such as entity resolution or data cleaning in live systems. In this work, we discuss a new generation of data integration systems that operate on (un-)structured data in an online setting, i.e., systems which process continuously modified datasets upon which the integration task is based. We use as an example of such a system an online event integration system called StoryPivot. It observes events extracted from news articles in data sources such as the 'Guardian' or the 'Washington Post' which are integrated to show users the evolution of real-world stories over time. The design decisions for StoryPivot are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Web Data Mining and Analysis · Advanced Database Systems and Queries
