Rethinking Kernel Program Repair: Benchmarking and Enhancing LLMs with RGym
Kareem Shehada, Yifan Wu, Wyatt D. Feng, Adithya Iyer, Gryphon Kumfert, Yangruibo Ding, Zhiyun Qian

TL;DR
This paper introduces RGym, a lightweight framework for evaluating and improving large language models in Linux kernel program repair, achieving higher success rates cost-effectively and with practical localization techniques.
Contribution
The paper presents RGym, a novel kernel-specific APR benchmark and pipeline that improves repair success rates using localization and feedback strategies, operating on local hardware.
Findings
Up to 43.36% pass rate with GPT-5 Thinking
Cost under $0.20 per bug
Localization and feedback significantly improve success
Abstract
Large Language Models (LLMs) have revolutionized automated program repair (APR) but current benchmarks like SWE-Bench predominantly focus on userspace applications and overlook the complexities of kernel-space debugging and repair. The Linux kernel poses unique challenges due to its monolithic structure, concurrency, and low-level hardware interactions. Prior efforts such as KGym and CrashFixer have highlighted the difficulty of APR in this domain, reporting low success rates or relying on costly and complex pipelines and pricey cloud infrastructure. In this work, we introduce RGym, a lightweight, platform-agnostic APR evaluation framework for the Linux kernel designed to operate on local commodity hardware. Built on RGym, we propose a simple yet effective APR pipeline leveraging specialized localization techniques (e.g., call stacks and blamed commits) to overcome the unrealistic usage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Security and Verification in Computing · Software Testing and Debugging Techniques
