RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Shaopeng Fu; Xingxing Zhang; Li Dong; Di Wang; Furu Wei

arXiv:2604.00790·cs.AI·April 2, 2026

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Shaopeng Fu, Xingxing Zhang, Li Dong, Di Wang, Furu Wei

PDF

TL;DR

RefineRL enhances competitive programming solutions by enabling LLMs to iteratively self-refine through a skeptical agent and reinforcement learning, significantly improving performance with smaller models.

Contribution

The paper introduces a novel self-refinement framework with a skeptical agent and RL training, enabling LLMs to improve problem-solving iteratively using only standard RLVR data.

Findings

01

Compact 4B models outperform larger 32B models after RL training.

02

Models approach the single-attempt performance of 235B models.

03

Self-refinement significantly boosts LLM reasoning capabilities.

Abstract

While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as competitive programming (CP), existing methods predominantly focus on single-attempt settings, overlooking their capacity for iterative refinement. In this paper, we present RefineRL, a novel approach designed to unleash the self-refinement capabilities of LLMs for CP problem solving. RefineRL introduces two key innovations: (1) Skeptical-Agent, an iterative self-refinement agent equipped with local execution tools to validate generated solutions against public test cases of CP problems. This agent always maintains a skeptical attitude towards its own outputs and thereby enforces rigorous self-refinement even when validation suggests correctness. (2) A reinforcement learning (RL) solution to incentivize LLMs to self-refine with only standard RLVR data (i.e., problems paired with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.