The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code
Hengzhi Ye, Fengyuan Ran, Weiwei Xu, Minghui Zhou

TL;DR
This study systematically evaluates the readability of LLM-generated code, compares it with human-written code, and examines how prompt design influences readability, revealing both potential and challenges for integrating LLMs into software development.
Contribution
It introduces a comprehensive readability model for code, evaluates LLM-generated code across thousands of scenarios, and analyzes prompt factors affecting readability.
Findings
LLM-generated code has comparable readability to human-written code.
Distinct readability issues are identified in LLM-generated code.
Prompt design has limited but notable influence on code readability.
Abstract
As Large Language Models (LLMs) are transforming software development, the functional quality of generated code has become a central focus, leaving readability, one of critical non-functional attributes, understudied. Given that LLM-generated code still needs human review before adoption, it is important to understand its readability especially compared with human-written code and the role of prompt design in shaping it. We therefore set out to conduct a systematic investigation into the code readability of LLM-generated code. To systematically quantify code readability, We establish a comprehensive readability model that synthesizes textual, structural, program, and visual features of code. Based on the model, we evaluate the readability of code generated by the mainstream LLMs under 5,869 scenarios extracted from large code base including World of Code (WoC) and LeetCode. We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
