Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs
Yujia Chen, Yang Ye, Xiao Chu, Yuchi Ma, Cuiyun Gao

TL;DR
This paper introduces ASTOR, a utility-guided multi-task reinforcement learning framework for code LLMs that dynamically allocates training resources based on task utility, leading to significant performance improvements.
Contribution
ASTOR's novel utility-driven scheduling and calibration modules enable more effective multi-task training for code LLMs, outperforming existing methods.
Findings
ASTOR improves performance on four coding tasks by 9.0%-9.5%.
ASTOR surpasses the strongest MTRL baseline by 7.5%-12.8%.
ASTOR outperforms task-specific specialists in experiments.
Abstract
Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of tasks, motivating a unified multi-task RL (MTRL) approach. However, existing MTRL methods treat all coding tasks uniformly, relying on fixed data curricula under a shared optimization strategy, ultimately limiting the effectiveness of multi-task training. To address these limitations, we propose ASTOR, a multi-tASk code reinforcement learning framework via uTility-driven coORdination. Centered on task utility, a signal capturing each task learning potential and cross-task synergy, ASTOR comprises two coupled modules: 1) Hierarchical Utility-Routed Data Scheduling module hierarchically allocates training budget and prioritizes informative prompts, steering training toward the most valuable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
