PELLI: Framework to effectively integrate LLMs for quality software generation
Rasmus Krebs, Somnath Mazumdar

TL;DR
This paper introduces PELLI, a comprehensive framework for evaluating and integrating LLMs into software development by assessing multiple nonfunctional quality metrics across different domains.
Contribution
The paper presents PELLI, a novel iterative assessment framework that evaluates LLM-generated code on maintainability, performance, and reliability, extending prior work focused mainly on reliability.
Findings
GPT-4T and Gemini performed slightly better across metrics.
Prompt design significantly influences code quality.
Application domains show varied scores across metrics.
Abstract
Recent studies have revealed that when LLMs are appropriately prompted and configured, they demonstrate mixed results. Such results often meet or exceed the baseline performance. However, these comparisons have two primary issues. First, they mostly considered only reliability as a comparison metric and selected a few LLMs (such as Codex and ChatGPT) for comparision. This paper proposes a comprehensive code quality assessment framework called Programmatic Excellence via LLM Iteration (PELLI). PELLI is an iterative analysis-based process that upholds high-quality code changes. We extended the state-of-the-art by performing a comprehensive evaluation that generates quantitative metrics for analyzing three primary nonfunctional requirements (such as maintainability, performance, and reliability) while selecting five popular LLMs. For PELLI's applicability, we selected three application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Artificial Intelligence in Healthcare and Education · Software Engineering Techniques and Practices
