Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems
Dominik Scheinert, Fabian Casares, Morgan K. Geldenhuys, Kevin, Styp-Rekowski, Odej Kao

TL;DR
This paper evaluates data enrichment techniques in distributed stream processing systems, highlighting how outsourcing enrichment can improve performance but also increase resource use, emphasizing the need for specialized solutions for cloud workloads.
Contribution
It provides an in-depth evaluation of data enrichment methods in DSP systems and identifies use cases where outsourcing enrichment enhances performance.
Findings
Outsourcing enrichment data can improve DSP system performance.
Enrichment increases resource consumption in cloud environments.
Specialized stream processing solutions are needed for performance-intensive workloads.
Abstract
Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Distributed systems and fault tolerance
