Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering
Ricardo Nobre, Lu\'is Reis, Jo\~ao M. P. Cardoso

TL;DR
This paper explores how specialized compiler phase ordering for OpenCL kernels on GPUs can significantly improve performance, demonstrating up to 5.7x speedups over standard compilation methods.
Contribution
It introduces a method for specializing LLVM compiler phase orders for OpenCL GPU kernels, leading to substantial performance gains compared to default compilation.
Findings
Achieved up to 5.7x performance improvements with specialized phase orders.
Demonstrated the importance of specific phase sequences for OpenCL kernel optimization.
Showed that using code features and similar benchmarks can further enhance performance.
Abstract
Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative exploration to specialize LLVM phase orders on 15 OpenCL benchmarks to an NVIDIA GPU. We analyze the generated NVIDIA PTX code for the various versions to identify the main causes of the most significant improvements and present results of a set of experiments that demonstrate the importance of using specific phase orders. Using specialized compiler phase orders, we were able to achieve geometric mean improvements of 1.54x (up to 5.48x) and 1.65x (up to 5.7x) over PTX generated by the NVIDIA CUDA compiler from CUDA versions of the same kernels, and over execution of the OpenCL kernels compiled from source with the NVIDIA OpenCL driver, respectively. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
