TL;DR
This paper evaluates various load prediction techniques, including classical and Deep Learning methods, for Distributed Stream Processing systems to improve resource management and QoS in data-intensive applications.
Contribution
It provides a comprehensive comparison of classical and Deep Learning load prediction methods tailored for DSP jobs across multiple real-world datasets.
Findings
Deep Learning methods generally outperform classical techniques in accuracy.
Deep Learning models require longer training times but offer better predictions.
Evaluation covers IoT, Web 2.0, and cluster monitoring domains.
Abstract
Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization tasks such as dynamic scaling, live migration of resources, and the tuning of configuration parameters during run-times, thus leading to a potentially better Quality of Service. In this paper we conduct a comprehensive evaluation of different load prediction techniques for DSP jobs. We identify three use-cases and formulate requirements for making load predictions specific to DSP jobs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
