Approximation of Permutation Invariant Polynomials by Transformers: Efficient Construction in Column-Size
Naoki Takeshita, Masaaki Imaizumi

TL;DR
This paper explores how transformers can efficiently approximate column-symmetric polynomials, providing theoretical insights into their expressive power and establishing explicit relationships between network size and approximation ability.
Contribution
It introduces a novel analysis of transformers' capacity to approximate column-invariant polynomials, extending understanding of their expressive power in symmetric algebraic functions.
Findings
Transformers can approximate column-symmetric polynomials with size-efficiency.
Explicit bounds relate transformer size to approximation accuracy.
The study bridges algebraic properties of polynomials with neural network approximation theory.
Abstract
Transformers are a type of neural network that have demonstrated remarkable performance across various domains, particularly in natural language processing tasks. Motivated by this success, research on the theoretical understanding of transformers has garnered significant attention. A notable example is the mathematical analysis of their approximation power, which validates the empirical expressive capability of transformers. In this study, we investigate the ability of transformers to approximate column-symmetric polynomials, an extension of symmetric polynomials that take matrices as input. Consequently, we establish an explicit relationship between the size of the transformer network and its approximation capability, leveraging the parameter efficiency of transformers and their compatibility with symmetry by focusing on the algebraic properties of symmetric polynomials.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsgraph theory and CDMA systems · Coding theory and cryptography · semigroups and automata theory
