Parallel Stream Processing Against Workload Skewness and Variance
Junhua Fang, Rong Zhang, Tom Z.J.Fu, Zhenjie Zhang, Aoying Zhou,, Junhua Zhu

TL;DR
This paper introduces a dynamic workload partitioning framework for parallel stream processing that adaptively balances key-based workloads under variance, minimizing migration costs and improving performance over existing methods.
Contribution
It proposes a novel hybrid routing strategy and an optimization-based approach for dynamic workload rebalancing in stateful stream processing systems.
Findings
Effective workload rebalancing under workload variance.
Significant reduction in state migration costs.
Improved performance over existing solutions.
Abstract
Key-based workload partitioning is a common strategy used in parallel stream processing engines, enabling effective key-value tuple distribution over worker threads in a logical operator. While randomized hashing on the keys is capable of balancing the workload for key-based partitioning when the keys generally follow a static distribution, it is likely to generate poor balancing performance when workload variance occurs on the incoming data stream. This paper presents a new key-based workload partitioning framework, with practical algorithms to support dynamic workload assignment for stateful operators. The framework combines hash-based and explicit key-based routing strategies for workload distribution, which specifies the destination worker threads for a handful of keys and assigns the other keys with the hashing function. When short-term distribution fluctuations occur to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Caching and Content Delivery · Peer-to-Peer Network Technologies
