QoSFlow: Ensuring Service Quality of Distributed Workflows Using Interpretable Sensitivity Models
Md Hasanur Rashid, Jesun Firoz, Nathan R. Tallent, Luanzheng Guo, Meng Tang, Dong Dai

TL;DR
QoSFlow is a modeling approach that partitions workflow configurations into regions with similar behavior to enable efficient, analytical QoS scheduling, outperforming standard heuristics and ensuring reliable execution outcomes.
Contribution
Introduces QoSFlow, a novel performance modeling method that uses interpretable sensitivity models to improve QoS scheduling for distributed workflows.
Findings
QoSFlow outperforms standard heuristics by 27.38% in execution recommendations.
Configurations recommended by QoSFlow match measured outcomes across various QoS constraints.
Evaluation on three workflows demonstrates effectiveness and generality.
Abstract
With the increasing importance of distributed scientific workflows, there is a critical need to ensure Quality of Service (QoS) constraints, such as minimizing time or limiting execution to resource subsets. However, the unpredictable nature of workflow behavior, even with similar configurations, makes it difficult to provide QoS guarantees. For effective reasoning about QoS scheduling, we introduce QoSFlow, a performance modeling method that partitions a workflow's execution configuration space into regions with similar behavior. Each region groups configurations with comparable execution times according to a given statistical sensitivity, enabling efficient QoS-driven scheduling through analytical reasoning rather than exhaustive testing. Evaluation on three diverse workflows shows that QoSFlow's execution recommendations outperform the best-performing standard heuristic by 27.38%.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
