SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

Yixiang Chen; Tianshi Zheng; Shijue Huang; Zhitao He; Yi R. Fung

arXiv:2511.02854·cs.SE·November 6, 2025

SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

Yixiang Chen, Tianshi Zheng, Shijue Huang, Zhitao He, Yi R. Fung

PDF

Open Access

TL;DR

This paper introduces SELF-REDRAFT, a framework for test-time code generation that encourages models to balance exploration and exploitation by proposing new drafts for flawed solutions, improving performance over existing methods.

Contribution

The paper presents SELF-REDRAFT, a novel approach that promotes intrinsic exploration-exploitation balancing in test-time code generation, highlighting its effectiveness and areas for future improvement.

Findings

01

SELF-REDRAFT outperforms Self-Refine under the same iteration limits.

02

Significant room for improvement remains in feedback generation and discriminative judgment.

03

Balancing strategies vary across different language models.

Abstract

Test-time scaling without interpreter feedback is essential for real-world code generation scenarios where test cases are not readily available. While existing paradigms often rely on either greedy exploitation (i.e., iterative refinement) or stochastic exploration (i.e., relying on sample-based voting or reranking mechanisms), the balance between these two dimensions remains underexplored. To investigate the LLM's intrinsic ability to balance exploitation and exploration, we introduce SELF-REDRAFT, a framework built upon Self-Refine that encourages the model to propose new drafts for solutions that are fundamentally flawed. Our results show that SELF-REDRAFT consistently achieves better performance than Self-Refine when converged under the same maximum number of iterations. Still, we observe that significant room for improvement remains, largely due to two core aspects of current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Software Engineering Methodologies