CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

Xiao Zhu; Xinyu Zhou; Boyu Zhu; Hanxu Hu; Mingzhe Du; Haotian Zhang; Huiming Wang; Zhijiang Guo

arXiv:2602.17684·cs.LG·May 19, 2026

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

Xiao Zhu, Xinyu Zhou, Boyu Zhu, Hanxu Hu, Mingzhe Du, Haotian Zhang, Huiming Wang, Zhijiang Guo

PDF

1 Repo 3 Models 1 Datasets

TL;DR

CodeScaler is a novel reward model that enhances code LLM training and inference by leveraging preference data and syntax-aware techniques, significantly improving performance and reducing latency.

Contribution

It introduces a scalable reward model trained on curated preferences and synthetic data, improving code generation and inference efficiency without relying on test cases.

Findings

01

Outperforms execution-based RL by +1.55 to +4.23 points on coding benchmarks.

02

Yields +14.64 points improvement over base models with synthetic data.

03

Achieves similar performance to unit test approaches with 10-fold latency reduction.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality test cases. We propose CodeScaler, a reward model designed to scale both reinforcement learning training and test-time inference for code generation. CodeScaler is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization. Across four coding benchmarks, CodeScaler consistently outperforms execution-based RL by +1.55 points on Qwen3-8B-Base and +4.23 points on Qwen3-14B-Base. By further scaling to 44K problems with additional synthetic data, CodeScaler yields +14.64 points…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lark-ai-lab/CodeScaler
github

Models

Datasets

LARK-Lab/CodeScalerPair-51K
dataset· 40 dl
40 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Testing and Debugging Techniques · Domain Adaptation and Few-Shot Learning