Code Generation by Differential Test Time Scaling
Yifeng He, Ethan Wang, Jicheng Wang, Xuanxin Ouyang, Hao Chen

TL;DR
DiffCodeGen introduces a scalable, model-agnostic test-time scaling method for code generation that improves output quality without extra LLM calls by using coverage-guided differential analysis and clustering.
Contribution
The paper proposes DiffCodeGen, a novel approach that enhances code generation by selecting diverse candidates through coverage-guided analysis without additional large language model inferences.
Findings
Consistent performance improvements across 4 large language models.
Achieves competitive or superior results with fewer tokens and less time.
Model-agnostic approach that can be combined with reasoning models.
Abstract
Test-time scaling has emerged as a promising approach for improving code generation by exploring large solution spaces at inference time. However, existing methods often rely on public test cases that are unavailable in practice, or require extensive LLM inference for candidate selection, leading to significant token consumption and time overhead. We present DiffCodeGen, a novel test-time scaling method for code generation based on coverage-guided differential analysis. DiffCodeGen generates diverse code candidates using various sampling and prompting strategies, then applies coverage-guided fuzzing to synthesize inputs without requiring any existing tests or large language models. By executing all candidates on these inputs, DiffCodeGen captures their dynamic behavior and clusters candidates based on behavioral similarity. DiffCodeGen selects the medoid of the largest cluster as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
