Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

Erfan Aghadavoodi Jolfaei; Daniel Maninger; Abhinav Anand; Mert Tiftikci; Mira Mezini

arXiv:2605.21180·cs.LG·May 21, 2026

Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

Erfan Aghadavoodi Jolfaei, Daniel Maninger, Abhinav Anand, Mert Tiftikci, Mira Mezini

PDF

TL;DR

This paper introduces a reinforcement learning framework that fine-tunes pre-trained language models for domain-specific code generation, improving correctness and executability in robotics and general programming tasks.

Contribution

It presents a customizable, execution-aware reinforcement learning approach with token-level credit assignment to adapt language models to diverse code generation domains.

Findings

01

19% increase in pass@1 on MBPP/MBPP+ benchmarks

02

51% reduction in execution failures on RoboEval

03

Substantial improvements in functional correctness and simulator executability

Abstract

Large language models show strong potential for automated code generation, but lack guarantees for correctness, quality, safety, and domain-specific constraints. For instance in robotics, where code generation is increasingly being used for planning and executing actions, awareness of the environment and physical constraints is critical. To facilitate the adaption of code-generating LLMs to diverse requirements, including domain-specific ones, we present a reinforcement learning framework that fine-tunes pre-trained LLMs using proximal policy optimization. Our customizable execution-aware reward formula captures and optimizes syntax, functional correctness, code style, security, and simulator executability. A token-level reward mapping mechanism enables effective credit assignment from execution outcomes to generated tokens. The framework is evaluated on general-purpose code generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.