Smart Scheduling of Continuous Data-Intensive Workflows with Machine Learning Triggered Execution
S\'ergio Esteves, Helena Galhardas, Lu\'is Veiga

TL;DR
This paper presents SmartFlux, a machine learning-based middleware that optimizes continuous data workflows by adaptively triggering computations, significantly reducing resource usage while maintaining output accuracy within acceptable error bounds.
Contribution
It introduces a novel workflow model that relaxes trigger conditions based on input-output impact, leveraging machine learning to improve efficiency in data-intensive processing.
Findings
Resource savings of up to 40% without significant output deviation
High confidence in maintaining output accuracy within error bounds
Seamless integration with existing workflow managers
Abstract
To extract value from evergrowing volumes of data, coming from a number of different sources, and to drive decision making, organizations frequently resort to the composition of data processing workflows, since they are expressive, flexible, and scalable. The typical workflow model enforces strict temporal synchronization across processing steps without accounting the actual effect of intermediate computations on the final workflow output. However, this is not the most desirable behavior in a multitude of scenarios. We identify a class of applications for continuous data processing where workflow output changes slowly and without great significance in a short-to-medium time window, thus wasting compute resources and energy with current approaches. To overcome such inefficiency, we introduce a novel workflow model, for continuous and data-intensive processing, capable of relaxing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Cloud Computing and Resource Management · Advanced Data Storage Technologies
