Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models
Eric L. Melin, Adam J. Torek, Nasir U. Eisty, Casey Kennington

TL;DR
This paper evaluates how quantization affects code quality in smaller LLMs, highlighting the need for careful validation due to potential quality and maintainability issues.
Contribution
It provides a proof of concept analyzing the impact of 8-bit and 4-bit quantization on code generation quality in open-source LLMs.
Findings
Quantized LLMs can generate functional code but with limited benchmark performance.
Quantization effects on code quality are variable and can introduce issues.
Generated code exhibits concerns in quality and maintainability.
Abstract
Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited. Quantization impacts are variable, and generated code exhibits quality and maintainability concerns. Conclusions: LLM-generated code should be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
