OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Wasi Uddin Ahmad; Somshubra Majumdar; Aleksander Ficek; Sean Narenthiran; Mehrzad Samadi; Jocelyn Huang; Siddhartha Jain; Vahid Noroozi; Boris Ginsburg

arXiv:2507.09075·cs.CL·July 15, 2025

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Wasi Uddin Ahmad, Somshubra Majumdar, Aleksander Ficek, Sean Narenthiran, Mehrzad Samadi, Jocelyn Huang, Siddhartha Jain, Vahid Noroozi, Boris Ginsburg

PDF

Open Access 10 Models 2 Datasets

TL;DR

This paper introduces OpenCodeReasoning-II, a large dataset for code reasoning, and a two-stage fine-tuning approach for LLMs that improves code generation and critique, enhancing performance on coding benchmarks.

Contribution

The paper presents a new large-scale dataset and a novel two-stage fine-tuning method for LLMs to improve code reasoning and critique capabilities.

Findings

01

Achieved state-of-the-art performance in code generation.

02

Significant improvements in competitive coding tasks.

03

Extended benchmark support for C++ language.

Abstract

Recent advancements in reasoning-based Large Language Models (LLMs), particularly their potential through test-time scaling, have created significant opportunities for distillation in code generation and critique. However, progress in both areas fundamentally depends on large-scale, high-quality datasets. In this work, we introduce OpenCodeReasoning-II, a dataset consists of 2.5M question-solution-critique triples (approx. 35K unique programming questions), making it nearly twice the size of the previous largest publicly available code reasoning dataset. In this work, we employ a two-stage supervised fine-tuning strategy. The first stage focuses on fine-tuning for code generation, while the second stage involves the joint training of models for both code generation and critique. Our resulting finetuned Qwen2.5-Instruct models achieve performance in code generation that either exceeds or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Model-Driven Software Engineering Techniques · Real-time simulation and control systems