CodeContests+: High-Quality Test Case Generation for Competitive Programming

Zihan Wang; Siyao Liu; Yang Sun; Hongyan Li; Kai Shen

arXiv:2506.05817·cs.SE·June 9, 2025

CodeContests+: High-Quality Test Case Generation for Competitive Programming

Zihan Wang, Siyao Liu, Yang Sun, Hongyan Li, Kai Shen

PDF

Open Access 4 Datasets 1 Video

TL;DR

This paper presents CodeContests+, an LLM-based system for generating high-quality test cases for competitive programming, significantly improving evaluation accuracy and benefiting reinforcement learning applications.

Contribution

It introduces a novel LLM-driven approach to generate superior test cases for competitive programming problems, enhancing dataset quality and evaluation reliability.

Findings

01

CodeContests+ achieves higher evaluation accuracy than previous datasets.

02

Test case quality improvements lead to better reinforcement learning outcomes.

03

Significant increase in True Positive Rate for test case evaluation.

Abstract

Competitive programming, due to its high reasoning difficulty and precise correctness feedback, has become a key task for both training and evaluating the reasoning capabilities of large language models (LLMs). However, while a large amount of public problem data, such as problem statements and solutions, is available, the test cases of these problems are often difficult to obtain. Therefore, test case generation is a necessary task for building large-scale datasets, and the quality of the test cases directly determines the accuracy of the evaluation. In this paper, we introduce an LLM-based agent system that creates high-quality test cases for competitive programming problems. We apply this system to the CodeContests dataset and propose a new version with improved test cases, named CodeContests+. We evaluated the quality of test cases in CodeContestsPlus. First, we used 1.72 million…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

CodeContests+: High-Quality Test Case Generation for Competitive Programming· underline

Taxonomy

TopicsTopic Modeling · Mobile Crowdsensing and Crowdsourcing · Multimodal Machine Learning Applications