A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code

Kazuaki Matsumura; Simon Garcia De Gonzalo; Antonio J. Pe\~na

arXiv:2301.11389·cs.DC·January 30, 2023

A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Pe\~na

PDF

1 Repo

TL;DR

This paper introduces a symbolic emulator tool that enhances low-level GPU optimizations for directive-based programming models like OpenACC by enabling automated shuffle instruction synthesis, improving performance across GPU generations.

Contribution

The paper presents a novel symbolic analysis-based emulator integrated into the compilation pipeline supporting CUDA and OpenACC, enabling low-level shuffle instruction optimizations previously difficult to achieve.

Findings

01

Automated shuffle instruction synthesis improves GPU performance.

02

The emulator supports multiple GPU architectures.

03

Enhanced low-level optimizations for OpenACC applications.

Abstract

Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method that easily enables parallel computing by just adhering code annotations to code loops. Such abstract models, however, often prevent programmers from making additional low-level optimizations to take advantage of the advanced architectural features of GPUs because the actual generated computation is hidden from the application developer. This paper describes and implements a novel flexible optimization technique that operates by inserting a code emulator phase to the tail-end of the compilation pipeline. Our tool emulates the generated code using symbolic analysis by substituting dynamic information and thus allowing for further low-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khaki3/ptxas-wrapper
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.