Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous   Systems

Zhenyu Bai; Dan Wu; Pranav Dangi; Dhananjaya Wijerathne; Venkata Pavan; Kumar Miriyala; Tulika Mitra

arXiv:2502.06304·cs.DC·February 12, 2025

Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems

Zhenyu Bai, Dan Wu, Pranav Dangi, Dhananjaya Wijerathne, Venkata Pavan, Kumar Miriyala, Tulika Mitra

PDF

Open Access

TL;DR

DyPe is a dynamic scheduling framework that automatically optimizes workload distribution on heterogeneous systems with accelerators, significantly improving performance and energy efficiency over static methods.

Contribution

It introduces DyPe, a novel data-aware, multi-objective scheduling approach that dynamically adapts to workload and system characteristics for heterogeneous accelerators.

Findings

01

DyPe finds optimal schedules in 89.5% of cases, outperforming static scheduling.

02

Average 1.53x throughput and 1.09x energy efficiency improvements.

03

Conventional static scheduling is optimal in only 15% of cases.

Abstract

Current approaches to scheduling workloads on heterogeneous systems with specialized accelerators often rely on manual partitioning, offloading tasks with specific compute patterns to accelerators. This method requires extensive experimentation and human effort to identify the tasks suitable for the accelerator. To solve this problem, we introduce DyPe, a scheduling framework tailored for heterogeneous systems with specialized accelerators. Our method automatically partitions, deploys, and reschedules execution when necessary by dynamically analyzing the characteristics of the input data and leveraging the interoperator parallelism among heterogeneous devices. DyPe navigates a multi-objective, multi-constraint design space that considers both system constraints and application requirements, which allows it to discover Pareto-optimal mapping configurations, improving the system's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Software System Performance and Reliability