HardTests: Synthesizing High-Quality Test Cases for LLM Coding
Zhongmou He, Yee Man Choi, Kexun Zhang, Jiabao Ji, Junting Zhou, Dejia Xu, Ivan Bercovich, Aidan Zhang, Lei Li

TL;DR
This paper introduces HARDTESTGEN, a pipeline for synthesizing high-quality test cases for LLM coding problems, significantly improving verification precision and aiding model training.
Contribution
It presents a novel pipeline for generating synthetic high-quality tests, creating a large dataset, and demonstrating improved verification and training effectiveness.
Findings
HARDTESTGEN tests have 11.3% higher precision and 17.5% higher recall than existing tests.
The approach improves verification accuracy especially on harder problems.
Using HARDTESTS enhances downstream code generation performance.
Abstract
Verifiers play a crucial role in large language model (LLM) reasoning, needed by post-training techniques such as reinforcement learning. However, reliable verifiers are hard to get for difficult coding problems, because a well-disguised wrong solution may only be detected by carefully human-written edge cases that are difficult to synthesize. To address this issue, we propose HARDTESTGEN, a pipeline for high-quality test synthesis using LLMs. With this pipeline, we curate a comprehensive competitive programming dataset HARDTESTS with 47k problems and synthetic high-quality tests. Compared with existing tests, HARDTESTGEN tests demonstrate precision that is 11.3 percentage points higher and recall that is 17.5 percentage points higher when evaluating LLM-generated code. For harder problems, the improvement in precision can be as large as 40 points. HARDTESTS also proves to be more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVLSI and Analog Circuit Testing · Advancements in Photolithography Techniques · Algorithms and Data Compression
