TL;DR
ReflexiCoder introduces a reinforcement learning framework enabling large language models to self-reflect and self-correct code internally, achieving state-of-the-art performance on multiple benchmarks without external feedback.
Contribution
It presents a novel RL-based training paradigm that internalizes reasoning and correction processes into the model, reducing reliance on external tools and improving efficiency.
Findings
Achieves new SOTA on seven code generation benchmarks.
Reduces inference compute overhead by approximately 40%.
Outperforms or rivals proprietary models like GPT-5.1.
Abstract
While Large Language Models (LLMs) have revolutionized code generation, standard ``System 1'' approaches that generate solutions in a single forward pass often hit a performance ceiling on complex algorithmic tasks. Existing iterative refinement strategies attempt to bridge this gap at inference time, yet they predominantly rely on external oracles, execution feedback, or computationally expensive prompt-response cycles. In this work, we propose ReflexiCoder, a novel reinforcement learning (RL) framework that internalizes the structured reasoning trajectory, encompassing initial generation, bug and optimization aware reflection, and self-correction, directly into the model's weights. Unlike prior methods, ReflexiCoder shifts the paradigm from external-dependent refinement to an intrinsic, fully autonomous self-reflection and self-correction capabilities at inference time. We utilize an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
