How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks

Johin Johny Arimbur

arXiv:2604.10508·cs.SE·April 14, 2026

How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks

Johin Johny Arimbur

PDF

TL;DR

This study demonstrates that iterative self-repair significantly improves code generation accuracy across various large language models and benchmarks, highlighting the effectiveness of prompting strategies and model architectures.

Contribution

It provides the first comprehensive comparison of self-repair across multiple models, architectures, and benchmarks, showing modern instruction-tuned models succeed with prompting alone.

Findings

01

Self-repair universally improves pass rates by up to 30 percentage points.

02

Gemini 2.5 Flash achieves the highest final pass rates of 96.3% on HumanEval.

03

Most repair gains occur within the first two attempts.

Abstract

Large language models frequently fail to produce correct code on their first attempt, yet most benchmarks evaluate them in a single-shot setting. We investigate iterative self-repair (feeding execution errors back to the model for correction) across seven models spanning three families and both open-weight and proprietary providers: Llama 3.1 8B, Llama 3.3 70B, Llama 4 Scout (MoE, 16 experts), Llama 4 Maverick (MoE, 128 experts), Qwen3 32B, Gemini 2.5 Flash, and Gemini 2.5 Pro. On HumanEval (164 problems) and MBPP Sanitized (257 problems) with up to five attempts, self-repair universally improves pass rates: +4.9 to +17.1 pp on HumanEval and +16.0 to +30.0 pp on MBPP. Gemini 2.5 Flash achieves the highest final pass rates (96.3% HumanEval, 93.8% MBPP). Most gains concentrate in the first two rounds.Error-type analysis shows assertion errors (logical mistakes) are the hardest to repair…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.