Investigating Software Aging in LLM-Generated Software Systems
C\'esar Santos, Ermeson Andrade, Roberto Natella

TL;DR
This study investigates the phenomenon of software aging in applications generated by Large Language Models, revealing significant degradation patterns over extended use and emphasizing the importance of considering aging in automated software development.
Contribution
It provides the first experimental analysis of software aging in LLM-generated software, highlighting degradation patterns and variability across applications.
Findings
Significant memory growth observed during tests
Increased response time and performance instability
Aging severity varies by application type
Abstract
Automatically generated software, especially code produced by Large Language Models (LLMs), is increasingly adopted to accelerate development and reduce manual effort. However, little is known about the long-term reliability of such systems under sustained execution. In this paper, we experimentally investigate the phenomenon of software aging in applications generated by LLM-based tools. Using the Bolt platform and standardized prompts from Baxbench, we generated four service-oriented applications and subjected them to 50-hour load tests. Resource usage, response time, and throughput were continuously monitored to detect degradation patterns. The results reveal significant evidence of software aging, including progressive memory growth, increased response time, and performance instability across all applications. Statistical analyzes confirm these trends and highlight variability in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
