The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget
Dangfeng Pan, Zhensu Sun, Cenyuan Zhang, David Lo, Xiaoning Du

TL;DR
This study shows that removing unnecessary code formatting elements can significantly reduce LLM input tokens and computational costs without affecting performance, offering a practical optimization for code-related AI tasks.
Contribution
The paper provides a comprehensive empirical analysis of code formatting's impact on LLM performance and introduces a tool for efficient code formatting removal.
Findings
Removing formatting reduces input tokens by 24.5% on average.
LMM performance remains stable despite formatting removal.
Prompting and fine-tuning can cut output length by up to 36.1% without losing correctness.
Abstract
Source code is usually formatted with elements like indentation and newlines to improve readability for human developers. However, these visual aids do not seem to be beneficial for large language models (LLMs) in the same way since the code is processed as a linear sequence of tokens. Furthermore, these additional tokens can lead to increased computational costs and longer response times for LLMs. If such formatting elements are non-essential to LLMs, we can reduce such costs by removing them from the code. To figure out the role played by formatting elements, we conduct a comprehensive empirical study to evaluate the impact of code formatting on LLM performance and efficiency. Through large-scale experiments on Fill-in-the-Middle Code Completion tasks across four programming languages (Java, Python, C++, C\#) and ten LLMs-including both commercial and open-source models-we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Library Science and Information Systems · Law, AI, and Intellectual Property
