You Don't Need Public Tests to Generate Correct Code
Kaushitha Silva, Srinath Perera

TL;DR
DryRUN enables autonomous code generation by large language models without relying on public test cases, using self-constructed inputs and simulation for validation, thus reducing dependency on labor-intensive external data.
Contribution
Introducing DryRUN, a novel framework allowing LLMs to generate correct code independently of public tests by self-synthesizing inputs and executing simulations.
Findings
DryRUN matches state-of-the-art test-dependent methods in performance.
It reduces reliance on external test data and decreases token usage.
DryRUN demonstrates effectiveness on the LiveCodeBench v6 dataset.
Abstract
Multi-agent systems are frequently employed for autonomous code generation, demonstrating strong utility in complex algorithmic problem-solving. Recent studies tackle the difficulty of producing functionally correct programs by leveraging simulation-guided planning and debugging, wherein language models step through execution traces to validate logic. Nevertheless, these methods rely heavily on human-authored public test cases to anchor the simulation and debugging cycles. Hand-crafting exhaustive input-output pairs creates a significant, labor-intensive bottleneck within the software development lifecycle. Since ground-truth examples are seldom accessible before actual implementation in real-world scenarios, this reliance limits existing approaches primarily to curated competitive programming datasets. Additionally, we demonstrate that depending on these public tests creates an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
