RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning
Shaopeng Fu, Xingxing Zhang, Li Dong, Di Wang, Furu Wei

TL;DR
RefineRL enhances competitive programming solutions by enabling LLMs to iteratively self-refine through a skeptical agent and reinforcement learning, significantly improving performance with smaller models.
Contribution
The paper introduces a novel self-refinement framework with a skeptical agent and RL training, enabling LLMs to improve problem-solving iteratively using only standard RLVR data.
Findings
Compact 4B models outperform larger 32B models after RL training.
Models approach the single-attempt performance of 235B models.
Self-refinement significantly boosts LLM reasoning capabilities.
Abstract
While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as competitive programming (CP), existing methods predominantly focus on single-attempt settings, overlooking their capacity for iterative refinement. In this paper, we present RefineRL, a novel approach designed to unleash the self-refinement capabilities of LLMs for CP problem solving. RefineRL introduces two key innovations: (1) Skeptical-Agent, an iterative self-refinement agent equipped with local execution tools to validate generated solutions against public test cases of CP problems. This agent always maintains a skeptical attitude towards its own outputs and thereby enforces rigorous self-refinement even when validation suggests correctness. (2) A reinforcement learning (RL) solution to incentivize LLMs to self-refine with only standard RLVR data (i.e., problems paired with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
