CodeContests+: High-Quality Test Case Generation for Competitive Programming
Zihan Wang, Siyao Liu, Yang Sun, Hongyan Li, Kai Shen

TL;DR
This paper presents CodeContests+, an LLM-based system for generating high-quality test cases for competitive programming, significantly improving evaluation accuracy and benefiting reinforcement learning applications.
Contribution
It introduces a novel LLM-driven approach to generate superior test cases for competitive programming problems, enhancing dataset quality and evaluation reliability.
Findings
CodeContests+ achieves higher evaluation accuracy than previous datasets.
Test case quality improvements lead to better reinforcement learning outcomes.
Significant increase in True Positive Rate for test case evaluation.
Abstract
Competitive programming, due to its high reasoning difficulty and precise correctness feedback, has become a key task for both training and evaluating the reasoning capabilities of large language models (LLMs). However, while a large amount of public problem data, such as problem statements and solutions, is available, the test cases of these problems are often difficult to obtain. Therefore, test case generation is a necessary task for building large-scale datasets, and the quality of the test cases directly determines the accuracy of the evaluation. In this paper, we introduce an LLM-based agent system that creates high-quality test cases for competitive programming problems. We apply this system to the CodeContests dataset and propose a new version with improved test cases, named CodeContests+. We evaluated the quality of test cases in CodeContestsPlus. First, we used 1.72 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Mobile Crowdsensing and Crowdsourcing · Multimodal Machine Learning Applications
