The Parallelism Tradeoff: Limitations of Log-Precision Transformers
William Merrill, Ashish Sabharwal

TL;DR
This paper analyzes the computational limitations of log-precision transformers, showing they can be simulated by constant-depth circuits and highlighting a fundamental parallelism tradeoff that constrains their problem-solving power.
Contribution
It provides a formal complexity-theoretic characterization of transformers with logarithmic precision, revealing inherent limitations and proposing a parallelism tradeoff in model architecture.
Findings
Transformers with log-precision are equivalent to constant-depth threshold circuits.
Such transformers cannot solve certain problems if L eq P, like linear equalities or context-free grammar membership.
The high parallelizability of transformers leads to fundamental computational limitations.
Abstract
Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and whose feedforward nets are computable using space linear in their input) can be simulated by constant-depth logspace-uniform threshold circuits. This provides insight on the power of transformers using known results in complexity theory. For example, if (i.e., not all poly-time problems can be solved using logarithmic space), then transformers cannot even accurately solve linear equalities or check membership in an arbitrary context-free grammar with empty productions. Our result intuitively emerges from the transformer architecture's high parallelizability. We thus speculatively introduce the idea of a fundamental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Ferroelectric and Negative Capacitance Devices · Neural Networks and Applications
