On the Expressive Power of Floating-Point Transformers
Sejun Park, Yeachan Park, Geonho Hwang

TL;DR
This paper investigates the expressive power of floating-point transformers, revealing their capabilities and limitations in representing permutation-equivariant and non-equivariant functions under finite-precision arithmetic.
Contribution
It extends theoretical understanding by analyzing how floating-point constraints affect the representability of functions in transformers, including the impact of positional encoding.
Findings
Floating-point transformers can represent some non-permutation-equivariant functions without positional encoding.
They can represent all permutation-equivariant functions when sequence length is bounded.
Large sequence lengths limit the representability of permutation-equivariant functions.
Abstract
The study on the expressive power of transformers shows that transformers are permutation equivariant, and they can approximate all permutation-equivariant continuous functions on a compact domain. However, these results are derived under real parameters and exact operations, while real implementations on computers can only use a finite set of numbers and inexact machine operations with round-off errors. In this work, we investigate the representability of floating-point transformers that use floating-point parameters and floating-point operations. Unlike existing results under exact operations, we first show that floating-point transformers can represent a class of non-permutation-equivariant functions even without positional encoding. Furthermore, we prove that floating-point transformers can represent all permutation-equivariant functions when the sequence length is bounded, but they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques
