Omniwise: Predicting GPU Kernels Performance with LLMs
Zixian Wang, Cole Ramos, Muhammad A. Awad, and Keith Lowery

TL;DR
Omniwise introduces a novel, lightweight LLM-based pipeline for predicting GPU kernel performance metrics directly from code, achieving high accuracy without execution or profiling, and integrating seamlessly into developer workflows.
Contribution
The paper presents the first end-to-end self-supervised fine-tuning pipeline applying LLMs to GPU kernel performance prediction, demonstrating model-agnostic and accurate predictions.
Findings
Achieves over 90% of predictions within 10% relative error.
Works effectively on AMD MI250 and MI300X architectures.
Provides tools like an inference server and VS Code plugin for practical use.
Abstract
In recent years, the rapid advancement of deep neural networks (DNNs) has revolutionized artificial intelligence, enabling models with unprecedented capabilities in understanding, generating, and processing complex data. These powerful architectures have transformed a wide range of downstream applications, tackling tasks beyond human reach. In this paper, we introduce Omniwise, the first end-to-end, self-supervised fine-tuning pipeline that applies large language models (LLMs) to GPU kernel performance prediction--a novel use case in performance profiling. Omniwise is model-agnostic and lightweight, achieving strong results even with a small 3B-parameter model. It can predict key performance metrics, including memory bandwidth, cache hit rates, GFLOPs, and arithmetic intensity, directly from kernel code without the need for code execution or profiling tools. Our approach achieves over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
