Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search
Daniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon

TL;DR
Record-Remix-Replay (R^3) is a hierarchical GPU kernel optimization framework that combines advanced search techniques to automate and accelerate performance tuning across multiple optimization dimensions.
Contribution
The paper introduces R^3, a novel hierarchical optimization framework that integrates LLM-driven evolutionary search, Bayesian optimization, and record-replay techniques for comprehensive GPU kernel tuning.
Findings
R^3 outperforms traditional kernel optimization methods.
It is nearly ten times faster than existing evolutionary search approaches.
R^3 achieves better optimization results for scientific applications.
Abstract
As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge. Developers often spend months tuning scientific applications to fully exploit new architectures, navigating a complex optimization space that spans algorithm design, source implementation, compiler flags and pass sequences, and kernel launch parameters. Existing approaches can effectively search parts of this space in isolation, such as launch configurations or compiler settings, but optimizing across the full space still requires substantial human expertise and iterative manual effort. In this paper, we present Record-Remix-Replay (R^3), a hierarchical optimization framework that combines LLM-driven evolutionary search, Bayesian optimization, and record-replay compilation techniques to efficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
