KernelBand: Steering LLM-based Kernel Optimization via Hardware-Aware Multi-Armed Bandits
Dezhi Ran, Shuxiao Xie, Mingfang Ji, Anmin Liu, Mengzhou Wu, Yuan Cao, Yuzhe Guo, Hao Yu, Linyi Li, Yitao Hu, Wei Yang, Tao Xie

TL;DR
KernelBand introduces a hardware-aware multi-armed bandit framework for optimizing GPU kernels, effectively balancing exploration and exploitation to improve performance in LLM serving tasks across diverse hardware.
Contribution
It formulates kernel optimization as a multi-armed bandit problem and designs novel pruning and clustering mechanisms for efficient exploration.
Findings
Achieves over 33% average performance improvement.
Outperforms state-of-the-art methods across multiple GPU architectures.
Provides theoretical guarantees on sample efficiency.
Abstract
High-performance GPU kernels are critical for efficient LLM serving, yet their optimization remains a bottleneck requiring deep system expertise. While code LLMs show promise in generating functionally correct code, kernel optimization is intrinsically a search problem over a vast optimization space. The fundamental mismatch prevents existing LLM agents from efficiently exploring the optimization space for diverse hardware and compute patterns. To bridge the gap, we present KernelBand, a framework that formulates kernel optimization as a Multi-Armed Bandit (MAB) problem, explicitly balancing exploration and exploitation to unlock the potential of code LLMs. To navigate the infinite arm space of optimization strategies applied to candidate kernels, we design two key mechanisms: a hardware-aware pruning strategy via profiling bounds and a trace-driven clustering algorithm that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Machine Learning and Data Classification · Advanced Neural Network Applications
