SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards

Suryansh Singh Sijwali; Suman Saha

arXiv:2601.01184·cs.CR·January 6, 2026

SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards

Suryansh Singh Sijwali, Suman Saha

PDF

Open Access

TL;DR

SecureCodeRL introduces a reinforcement learning approach with partial-credit rewards to improve security-awareness and correctness in code generation by large language models, reducing reward sparsity and enhancing syntax validity.

Contribution

It proposes a novel RL pipeline with partial-credit rewards for security-aware code generation, demonstrating improved syntax validity and security in generated code.

Findings

01

Syntax validity increased from 45% to 60%.

02

Achieved 5% success rate on test pass in pilot evaluation.

03

Generated code remains 100% clean under static analysis.

Abstract

Large Language Models (LLMs) can generate plausible code, but in settings that require exact stdin/stdout behavior they frequently produce programs that compile yet fail tests, and in some cases they introduce security-sensitive patterns. This paper presents SecureCodeRL, a reinforcement learning (RL) pipeline for security-aware code generation that optimizes a combined reward R = {\alpha}Rfunc + \b{eta}Rsec. The key idea is a partial-credit functional reward that assigns intermediate scores for syntactic validity, successful execution, and producing output, reducing reward sparsity that otherwise stalls learning on competitive programming style tasks. I evaluate supervised fine-tuning (SFT) and PPO variants on a small held-out prompt set from APPS+ and observe that PPO with partial credit (using a continued-training variant) improves syntax validity from 45% (SFT) to 60% and achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research