Transformers are Efficient Compilers, Provably
Xiyu Zhai, Runlong Zhou, Liao Zhang, Simon Shaolei Du

TL;DR
This paper provides a formal analysis showing that transformers can efficiently perform compiler tasks with logarithmic parameter complexity under certain conditions, and demonstrates their superiority over RNNs through theoretical proofs and empirical validation.
Contribution
It introduces a formal framework and a domain-specific language to prove transformers' expressive power in compiler tasks, highlighting their efficiency and exponential advantage over RNNs.
Findings
Transformers require only logarithmic parameters for certain compiler tasks.
RNNs need at least linear parameters, showing an exponential gap.
Empirical results confirm transformers' efficiency in Mini-Husky compiler tasks.
Abstract
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generation. In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective. To this end, we introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages. We show that if the input code sequence has a bounded depth in both the Abstract Syntax Tree (AST) and type inference (reasonable assumptions based on the clean code principle), then the number of parameters required by transformers depends only on the logarithm of the input sequence length to handle compilation tasks, such as AST construction, symbol resolution, and type analysis. A significant technical…
Peer Reviews
Decision·Submitted to ICLR 2025
* The authors attempt a formal analysis of Transformers.
(1) The paper is fundamentally incomplete: while the authors claim to introduce Mini-Husky as a pared-down programming language, they do not do this at all in the main text of the paper, and even in the extensive appendix, they do not actually provide a full definitions of syntax or semantics. As example, note that the BNF is Appendix D does not in any way correspond to the example given 5 lines under it, which uses keywords such as `mut` and assignment operators such as `+=`, `.` and `assert`
The paper is original in attempting to use a "bottom-up" formalism for proving the asymptotic memory requirements of LLMs for program compilation tasks.
The paper does not convince the reader that its main claims are supported by proof or empirical evidence. Alternatively, the claims are moot since they only apply to a language whose semantic properties can be checked with bounded complexity (MiniHusky). The exposition of this paper is too intricate, and most of the body of the paper is spent introducing background definitions and not the actual substance of the claims. * Theorem 1 is "proven" essentially by repeating the theorem statement. T
While transformer models have been extensively used to compile and generate code, how (well) they proceed with such tasks remains understudied, which is in part due to the significant gap between neural and symbolic methods in terms of the level of abstraction (which is further amplified in the context of compilation, as the authors have identified). Thus, I recognize this paper as an admirable and overall successful shot at the task of characterizing how transformers carry out compilation. Alt
While the paper is overall coherent, I did find some bits to be unintuitive (due to the order in which concepts are introduced) and even obtuse (due to poor prose). For instance, I would suggest modifying/reordering Section 5.4 to first motivate the use of key abstractions (e.g., computation graphs), and then go over what constitutes types in Cybertron, followed by how such types constrain the use of abstractions and ensure the soundness of propositions (e.g., that there exists a transformer enc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
