AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation
Zhensu Sun, Xiaoning Du, Zhou Yang, Li Li, David Lo

TL;DR
This paper introduces AI-oriented grammar, exemplified by SimPy, a revised Python grammar designed to reduce token usage and improve computational efficiency for AI models without sacrificing code semantics or performance.
Contribution
The paper presents the first AI-oriented grammar for Python, SimPy, which minimizes tokens and enhances inference efficiency for large language models in code generation tasks.
Findings
SimPy reduces token usage by 13.5% with CodeLlama and 10.4% with GPT-4.
Programs in SimPy maintain identical AST structures to Python.
Models can perform as well or better using SimPy compared to standard Python.
Abstract
Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar. This aims to represent code in a way that better suits the working mechanism of AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Software Engineering Research · Engineering and Information Technology
MethodsAttention Is All You Need · Sparse Evolutionary Training · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections
