Measuring LLM Code Generation Stability via Structural Entropy
Yewei Song, Tiezhu Sun, Xunzhu Tang, Prateek Rajput, Tegawende F. Bissyande, Jacques Klein

TL;DR
This paper introduces a novel, reference-free method to measure the stability of LLM-generated code using structural entropy of abstract syntax trees, revealing insights into model consistency and robustness.
Contribution
It extends structural-entropy concepts to code, proposing new metrics based on AST analysis that are language-agnostic, execution-independent, and provide nuanced stability evaluation.
Findings
AST-based entropy metrics differentiate model stability.
Metrics are reference-free and language-agnostic.
Benchmarking shows nuanced stability differences among LLMs.
Abstract
Assessing the stability of code generation from large language models (LLMs) is essential for judging their reliability in real-world development. We extend prior "structural-entropy concepts" to the program domain by pairing entropy with abstract syntax tree (AST) analysis. For any fixed prompt, we collect the multiset of depth-bounded subtrees of AST in each generated program and treat their relative frequencies as a probability distribution. We then measure stability in two complementary ways: (i) Jensen-Shannon divergence, a symmetric, bounded indicator of structural overlap, and (ii) a Structural Cross-Entropy ratio that highlights missing high-probability patterns. Both metrics admit structural-only and token-aware variants, enabling separate views on control-flow shape and identifier-level variability. Unlike pass@k, BLEU, or CodeBLEU, our metrics are reference-free,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemiconductor materials and devices · Advancements in Photolithography Techniques · Copper Interconnects and Reliability
