Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Ryo Mikasa; Shun-ichiro Hayashi; Daichi Mukunoki; Tetsuya Hoshino; Takahiro Katagiri

arXiv:2602.12049·cs.LG·February 13, 2026

Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Ryo Mikasa, Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

PDF

Open Access

TL;DR

This paper introduces an online reinforcement learning method that uses real-machine benchmark rewards to enhance the HPC code generation capabilities of large language models, demonstrating improved performance on matrix multiplication tasks.

Contribution

It presents a novel online RL approach with real-machine feedback and a staged optimization algorithm to improve LLMs for HPC code generation.

Findings

01

Reinforcement learning with real-machine benchmarks improves code performance.

02

The staged optimization algorithm enables diverse code improvements.

03

Experimental results show enhanced HPC code generation capabilities.

Abstract

Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train LLMs using runtime performance as a reward in the HPC domain. We propose an online reinforcement learning approach that executes LLM-generated code on a supercomputer and directly feeds back the measured runtime performance (GFLOPS) as a reward. We further introduce a Staged Quality-Diversity (SQD) algorithm that progressively varies the permitted optimization techniques on a per-problem basis, enabling the model to learn code optimization from diverse perspectives. We build a distributed system connecting a GPU training cluster with a CPU benchmarking cluster, and train Qwen2.5 Coder 14B on a double-precision matrix multiplication task using Group Relative Policy Optimization (GRPO). Through two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Materials Science