Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code

Prateek Rajput; Yewei Song; Abdoul Aziz Bonkoungou; Iyiola E. Olatunji; Abdoul Kader Kabore; Jacques Klein; Tegawend\'e F. Bissyand\'e

arXiv:2601.01215·cs.SE·February 3, 2026

Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code

Prateek Rajput, Yewei Song, Abdoul Aziz Bonkoungou, Iyiola E. Olatunji, Abdoul Kader Kabore, Jacques Klein, Tegawend\'e F. Bissyand\'e

PDF

Open Access

TL;DR

This paper investigates the variability in runtime memory behavior among correct solutions generated by large language models, revealing significant divergence that impacts operational reliability and suggesting stability-aware selection methods.

Contribution

The authors introduce a novel framework with metrics like DMPD and MIS to quantify memory stability in LLM-generated code, highlighting the importance of stability in operational safety.

Findings

01

Substantial runtime memory divergence among correct solutions.

02

Higher sampling temperature can increase instability.

03

Memory stability correlates with software complexity indicators.

Abstract

Large language models (LLMs) can generate programs that pass unit tests, but passing tests does not guarantee reliable runtime behavior. We find that different correct solutions to the same task can show very different memory and performance patterns, which can lead to hidden operational risks. We present a framework to measure execution-time memory stability across multiple correct generations. At the solution level, we introduce Dynamic Mean Pairwise Distance (DMPD), which uses Dynamic Time Warping to compare the shapes of memory-usage traces after converting them into Monotonic Peak Profiles (MPPs) to reduce transient noise. Aggregating DMPD across tasks yields a model-level Model Instability Score (MIS). Experiments on BigOBench and CodeContests show substantial runtime divergence among correct solutions. Instability often increases with higher sampling temperature even when pass@1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Parallel Computing and Optimization Techniques