TL;DR
This paper introduces a symbolic emulator tool that enhances low-level GPU optimizations for directive-based programming models like OpenACC by enabling automated shuffle instruction synthesis, improving performance across GPU generations.
Contribution
The paper presents a novel symbolic analysis-based emulator integrated into the compilation pipeline supporting CUDA and OpenACC, enabling low-level shuffle instruction optimizations previously difficult to achieve.
Findings
Automated shuffle instruction synthesis improves GPU performance.
The emulator supports multiple GPU architectures.
Enhanced low-level optimizations for OpenACC applications.
Abstract
Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method that easily enables parallel computing by just adhering code annotations to code loops. Such abstract models, however, often prevent programmers from making additional low-level optimizations to take advantage of the advanced architectural features of GPUs because the actual generated computation is hidden from the application developer. This paper describes and implements a novel flexible optimization technique that operates by inserting a code emulator phase to the tail-end of the compilation pipeline. Our tool emulates the generated code using symbolic analysis by substituting dynamic information and thus allowing for further low-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
