SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards
Suryansh Singh Sijwali, Suman Saha

TL;DR
SecureCodeRL introduces a reinforcement learning approach with partial-credit rewards to improve security-awareness and correctness in code generation by large language models, reducing reward sparsity and enhancing syntax validity.
Contribution
It proposes a novel RL pipeline with partial-credit rewards for security-aware code generation, demonstrating improved syntax validity and security in generated code.
Findings
Syntax validity increased from 45% to 60%.
Achieved 5% success rate on test pass in pilot evaluation.
Generated code remains 100% clean under static analysis.
Abstract
Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns. This paper presents SecureCodeRL, a reinforcement learning (RL) pipeline for security-aware code generation that optimizes a combined reward R = {\alpha}Rfunc + \b{eta}Rsec. The key idea is a partial-credit functional reward that assigns intermediate scores for syntactic validity, successful execution, and producing output, reducing reward sparsity that otherwise stalls learning on competitive programming style tasks. I evaluate supervised fine-tuning (SFT) and PPO variants on a small held-out prompt set from APPS+ and observe that PPO with partial credit (using a continued-training variant) improves syntax validity from 45% (SFT) to 60% and achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research
