Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on   CPU-GPU Heterogeneous Systems via Machine Learning

Issa Saba; Eishi Arima; Dai Liu; Martin Schulz

arXiv:2405.03831·cs.DC·May 8, 2024

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning

Issa Saba, Eishi Arima, Dai Liu, Martin Schulz

PDF

TL;DR

This paper presents a machine learning-based approach to optimize co-scheduling, resource partitioning, and power capping in CPU-GPU systems to maximize throughput while respecting power constraints.

Contribution

It introduces a predictive performance modeling framework that jointly optimizes scheduling, resource allocation, and power capping on heterogeneous systems.

Findings

01

Achieves up to 67% speedup over naive scheduling.

02

Effectively balances power and performance in CPU-GPU systems.

03

Demonstrates the benefits of ML-driven optimization in real hardware.

Abstract

CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single program typically cannot fully exploit all available resources. At the same time, power consumption is a key issue and often requires optimizing power allocations to the CPU and GPU while enforcing a total power constraint, in particular when the power/thermal requirements are strict. The result is a system-wide optimization problem with several knobs. In particular we focus on (1) co-scheduling decisions, i.e., selecting programs to co-locate in a space sharing manner; (2) resource partitioning on both CPUs and GPUs; and (3) power capping on both CPUs and GPUs. We solve this problem using predictive performance modeling using machine learning in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.