Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

Yujia Chen; Yang Ye; Xiao Chu; Yuchi Ma; Cuiyun Gao

arXiv:2605.06111·cs.SE·May 8, 2026

Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

Yujia Chen, Yang Ye, Xiao Chu, Yuchi Ma, Cuiyun Gao

PDF

TL;DR

This paper introduces ASTOR, a utility-guided multi-task reinforcement learning framework for code LLMs that dynamically allocates training resources based on task utility, leading to significant performance improvements.

Contribution

ASTOR's novel utility-driven scheduling and calibration modules enable more effective multi-task training for code LLMs, outperforming existing methods.

Findings

01

ASTOR improves performance on four coding tasks by 9.0%-9.5%.

02

ASTOR surpasses the strongest MTRL baseline by 7.5%-12.8%.

03

ASTOR outperforms task-specific specialists in experiments.

Abstract

Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of tasks, motivating a unified multi-task RL (MTRL) approach. However, existing MTRL methods treat all coding tasks uniformly, relying on fixed data curricula under a shared optimization strategy, ultimately limiting the effectiveness of multi-task training. To address these limitations, we propose ASTOR, a multi-tASk code reinforcement learning framework via uTility-driven coORdination. Centered on task utility, a signal capturing each task learning potential and cross-task synergy, ASTOR comprises two coupled modules: 1) Hierarchical Utility-Routed Data Scheduling module hierarchically allocates training budget and prioritizes informative prompts, steering training toward the most valuable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.