POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems
Xi Huang, Ziyu Shao, Yang Yang

TL;DR
This paper introduces POTUS, a predictive online tuple scheduling system for data stream processing that reduces response time and balances workload by leveraging stochastic network optimization and mild future information.
Contribution
It formulates tuple scheduling as a stochastic optimization problem and proposes POTUS, the first predictive scheduling scheme with theoretical guarantees for data stream systems.
Findings
Achieves ultra-low response time in simulations.
Guarantees queue stability under workload fluctuations.
Effective even with inaccurate future predictions.
Abstract
Most online service providers deploy their own data stream processing systems in the cloud to conduct large-scale and real-time data analytics. However, such systems, e.g., Apache Heron, often adopt naive scheduling schemes to distribute data streams (in the units of tuples) among processing instances, which may result in workload imbalance and system disruption. Hence, there still exists a mismatch between the temporal variations of data streams and such inflexible scheduling scheme designs. Besides, the fundamental benefits of predictive scheduling to data stream processing systems also remain unexplored. In this paper, we focus on the problem of tuple scheduling with predictive service in Apache Heron. With a careful choice in the granularity of system modeling and decision making, we formulate the problem as a stochastic network optimization problem and propose POTUS, an online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems
