Machine Learning for CUDA+MPI Design Rules

Carl Pearson; Aurya Javeed; Karen Devine

arXiv:2203.02530·cs.PF·March 21, 2022

Machine Learning for CUDA+MPI Design Rules

Carl Pearson, Aurya Javeed, Karen Devine

PDF

Open Access

TL;DR

This paper introduces a machine learning-based approach to automatically explore and generate design rules for CUDA+MPI programs, helping optimize performance by identifying impactful design choices.

Contribution

It presents a novel method combining Monte-Carlo tree search and decision trees to efficiently discover and recommend high-impact design configurations for CUDA+MPI applications.

Findings

01

Effective identification of performance-critical design regions

02

Automated generation of design rules for CUDA+MPI programs

03

Demonstrated on sparse-matrix vector multiplication

Abstract

We present a new strategy for automatically exploring the design space of key CUDA+MPI programs and providing design rules that discriminate slow from fast implementations. In such programs, the order of operations (e.g., GPU kernels, MPI communication) and assignment of operations to resources (e.g., GPU streams) makes the space of possible designs enormous. Systems experts have the task of redesigning and reoptimizing these programs to effectively utilize each new platform. This work provides a prototype tool to reduce that burden. In our approach, a directed acyclic graph of CUDA and MPI operations defines the design space for the program. Monte-Carlo tree search discovers regions of the design space that have large impact on the program's performance. A sequence-to-vector transformation defines features for each explored implementation, and each implementation is assigned a class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Embedded Systems Design Techniques

MethodsMonte-Carlo Tree Search