Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics
Xin Sun, Daniel St{\aa}hl, Kristian Sandahl, Christoph Kessler

TL;DR
This study investigates the non-functional quality characteristics of LLM-generated code, revealing gaps between research focus, industry priorities, and actual model behavior, emphasizing the need for integrated quality assurance.
Contribution
It provides a comprehensive multi-method analysis of non-functional qualities in LLM-generated code, highlighting misalignments and proposing the need for quality assurance mechanisms.
Findings
Research mainly emphasizes security, performance, and maintainability.
Practitioners prioritize maintainability and readability.
Optimizing non-functional qualities via prompts is unstable in practice.
Abstract
In recent years, large language models have been widely integrated into software engineering workflows, supporting tasks like code generation. While prior evaluations focus on functional correctness, there is still a limited understanding of the non-functional quality characteristics of generated code. Guided by the ISO/IEC 25010 quality model, this study adopts a multi-methods approach comprising three complementary elements: a literature review of 109 papers, two industry workshops with practitioners from multiple organizations, and an empirical analysis of patching real-world software issues using three LLMs. Motivated by insights from both the literature and practitioners, the empirical study examined the quality of generated patches regarding security, maintainability, and performance efficiency, which were identified as critical code-level quality attributes. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
