Can Large Language Models Predict Parallel Code Performance?

Gregory Bolet; Giorgis Georgakoudis; Harshitha Menon; Konstantinos; Parasyris; Niranjan Hasabnis; Hayden Estes; Kirk W. Cameron; Gal Oren

arXiv:2505.03988·cs.DC·May 8, 2025

Can Large Language Models Predict Parallel Code Performance?

Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos, Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren

PDF

Open Access

TL;DR

This paper investigates whether large language models can predict GPU kernel performance class (compute-bound or bandwidth-bound) from source code and hardware info, potentially replacing costly hardware profiling.

Contribution

It introduces a novel approach using LLMs for source-level roofline classification, demonstrating high accuracy with profiling data and promising results in zero- and few-shot scenarios.

Findings

01

LLMs achieve 100% accuracy with profiling data

02

Zero-shot LLMs reach up to 64% accuracy without profiling

03

Fine-tuning LLMs requires more data than currently available

Abstract

Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs. This paper explores whether Large Language Models (LLMs) can offer an alternative approach for GPU performance prediction without relying on hardware. We frame the problem as a roofline classification task: given the source code of a GPU kernel and the hardware specifications of a target GPU, can an LLM predict whether the GPU kernel is compute-bound or bandwidth-bound? For this study, we build a balanced dataset of 340 GPU kernels, obtained from HeCBench benchmark and written in CUDA and OpenMP, along with their ground-truth labels obtained via empirical GPU profiling. We evaluate LLMs across four scenarios: (1) with access to profiling data of the kernel source, (2) zero-shot with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Natural Language Processing Techniques · Big Data and Digital Economy