Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Colin Wei, Yining Chen, Tengyu Ma

TL;DR
This paper introduces a formal framework for statistically meaningful approximation, demonstrating that neural networks and transformers can effectively approximate boolean circuits and Turing machines with polynomial sample complexity, emphasizing learnability.
Contribution
It defines statistically meaningful approximation and proves neural networks and transformers can achieve it for boolean circuits and Turing machines with polynomial sample complexity.
Findings
Neural nets can SM approximate boolean circuits with polynomial sample complexity.
Transformers can SM approximate Turing machines with polynomial sample complexity.
New tools for analyzing generalization provide tighter sample complexity bounds.
Abstract
A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
