Can LLMs Produce Better Object-Oriented Designs than Human-Involved Development?
Zushuai Zhang, Elliott Wen, Ewan Tempero

TL;DR
This study compares object-oriented design quality in projects created by humans before and after widespread LLM use, versus those generated entirely by LLMs, revealing strengths and weaknesses of AI-generated designs.
Contribution
It provides a comparative analysis of OOD quality across human-involved and AI-generated projects, highlighting the importance of human guidance in LLM-based design.
Findings
PureAI projects have lower code smell density but tend to oversimplify design.
PostAI projects are closer to PureAI but still show signs of oversimplification.
Human guidance remains crucial for effective object-oriented design with LLMs.
Abstract
Background: Large Language Models (LLMs) are increasingly used for code generation. However, their ability to generate multi-class projects that require object-oriented design (OOD) remains unclear, especially relative to projects developed with human involvement. Aims: The primary objective of this study is to compare OOD quality in projects from three authorship conditions: PreAI (human-involved projects produced before widespread LLM use), PostAI (human-involved projects produced after widespread LLM use), and PureAI (projects generated end-to-end by contemporary LLMs). Method: We conducted a comparative case study on a postgraduate Java assignment. Two offerings of the same assignment were selected as the PreAI and PostAI datasets. PureAI projects were generated using three contemporary LLMs. We analyzed OOD quality using project-level OOD metrics, code smell density, and domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
