Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

Huan Zhang; Wei Cheng; Wei Hu

arXiv:2603.29292·cs.SE·April 1, 2026

Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

Huan Zhang, Wei Cheng, Wei Hu

PDF

TL;DR

This paper introduces ConSelf, a self-improving method for code generation models that uses semantic entropy for curriculum learning and a consensus-based preference optimization to enhance performance without external supervision.

Contribution

It proposes a novel self-improvement framework combining semantic entropy and consensus-driven preference optimization for code generation models.

Findings

01

ConSelf outperforms baseline models on various benchmarks.

02

Semantic entropy effectively identifies learnable problems for curriculum.

03

Consensus-driven optimization reduces noise in self-generated supervision.

Abstract

Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.