A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster
Hamid Nasiri, Saeed Nasehi, Arman Divband, Maziar Goudarzi

TL;DR
This paper introduces a heterogeneity-aware scheduling algorithm for distributed stream processing frameworks that optimizes vertex placement and resource utilization, significantly improving throughput on large-scale clusters.
Contribution
It presents a novel scheduling algorithm that effectively maps application vertices to heterogeneous cluster nodes, outperforming default schedulers in throughput and near-optimal in solution quality.
Findings
Achieves 7% to 44% throughput improvement over Storm's default scheduler.
Predicts CPU utilization with 92% accuracy.
Finds near-optimal solutions within 4% of the best possible.
Abstract
In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall throughput and resource utilization; while the simplicity of current DSPFs' schedulers leads these frameworks to perform poorly on large-scale clusters. In this paper, we present the design and implementation of a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We start to scale up the application graph over a given cluster gradually, by increasing the topology input rate and taking new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Data Stream Mining Techniques · Distributed and Parallel Computing Systems
