Optimizing Multiple Multi-Way Stream Joins
Manuel Dossinger, Sebastian Michel

TL;DR
This paper introduces an ILP-based optimization approach for multi-way stream joins in scale-out architectures, dynamically adjusting partitioning and routing to improve performance in real-time data processing.
Contribution
It adapts prior multi-way join techniques to predicate-driven partitioning, formulates an ILP for optimal routing and placement, and implements these in CLASH for efficient query deployment.
Findings
ILP optimization reduces probe load in stream joins
Dynamic reconfiguration improves adaptability to data changes
Experimental results show effectiveness on real-world data
Abstract
We address the joint optimization of multiple stream joins in a scale-out architecture by tailoring prior work on multi-way stream joins to predicate-driven data partitioning schemes. We present an integer linear programming (ILP) formulation for selecting the partitioning and tuple routing with minimal probe load and describe how routing and operator placement can be rewired dynamically at changing data characteristics and arrival or expiration of queries. The presented algorithms and optimization schemes are implemented in CLASH, a data stream processor developed in our group that translates queries to deployable Apache Storm topologies after optimization. The experiments conducted over real-world data exhibit the potential of multi-query optimization of multi-way stream joins and the effectiveness and feasibility of the ILP optimization problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
