Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter   Analysis

Giuseppe M. Sarda; Nimish Shah; Debjyoti Bhattacharjee; Peter; Debacker; Marian Verhelst

arXiv:2407.11999·cs.AR·July 18, 2024

Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter Analysis

Giuseppe M. Sarda, Nimish Shah, Debjyoti Bhattacharjee, Peter, Debacker, Marian Verhelst

PDF

TL;DR

This paper introduces a hardware-aware runtime mapping technique for open-source GPGPU platforms that optimizes performance by analyzing micro-architecture parameters, surpassing traditional hardware-agnostic methods.

Contribution

It presents a novel micro-architecture parameter analysis approach for runtime kernel mapping on open-source GPGPUs, improving performance and resource utilization.

Findings

01

Significant performance improvements on Vortex GPGPU

02

Effective optimization across various GPU configurations

03

Enhanced hardware resource utilization

Abstract

GPGPU execution analysis has always been tied to closed-source, proprietary benchmarking tools that provide high-level, non-exhaustive, and/or statistical information, preventing a thorough understanding of bottlenecks and optimization possibilities. Open-source hardware platforms offer opportunities to overcome such limits and co-optimize the full {hardware-mapping-algorithm} compute stack. Yet, so far, this has remained under-explored. In this work, we exploit micro-architecture parameter analysis to develop a hardware-aware, runtime mapping technique for OpenCL kernels on the open Vortex RISC-V GPGPU. Our method is based on trace observations and targets optimal hardware resource utilization to achieve superior performance and flexibility compared to hardware-agnostic mapping approaches. The technique was validated on different architectural GPU configurations across several OpenCL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.