Workflow-Driven Modeling for the Compute Continuum: An Optimization Approach to Automated System and Workload Scheduling
Aasish Kumar Sharma, Christian Boehme, Patrick Gel{\ss}, Ramin Yahyapour, Julian Kunkel

TL;DR
This paper presents a comprehensive modeling framework that automates workload scheduling across cloud and HPC resources, improving efficiency and reducing execution times in the compute continuum.
Contribution
It introduces a novel system and workload modeling approach that enables automated, optimized task orchestration across heterogeneous cloud and HPC infrastructures.
Findings
MILP-based solution achieves optimal scheduling for small workflows.
Heuristic methods provide up to 99% faster estimates for large workflows.
Scheduling efficiency is significantly improved, reducing execution times.
Abstract
The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources often introduces latency and communication overhead, which can hinder the performance of tightly coupled parallel applications. Additionally, achieving seamless interoperability between cloud and on-premises HPC systems requires advanced scheduling, resource management, and data transfer protocols. Consequently, users must manually allocate complex workloads across heterogeneous resources, leading to suboptimal task placement and reduced efficiency due to the absence of an automated scheduling mechanism. To overcome these challenges, we introduce a comprehensive framework based on rigorous system and workload modeling for the compute continuum. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
