The Parallelism Tradeoff: Limitations of Log-Precision Transformers

William Merrill; Ashish Sabharwal

arXiv:2207.00729·cs.CC·April 28, 2023·1 cites

The Parallelism Tradeoff: Limitations of Log-Precision Transformers

William Merrill, Ashish Sabharwal

PDF

Open Access

TL;DR

This paper analyzes the computational limitations of log-precision transformers, showing they can be simulated by constant-depth circuits and highlighting a fundamental parallelism tradeoff that constrains their problem-solving power.

Contribution

It provides a formal complexity-theoretic characterization of transformers with logarithmic precision, revealing inherent limitations and proposing a parallelism tradeoff in model architecture.

Findings

01

Transformers with log-precision are equivalent to constant-depth threshold circuits.

02

Such transformers cannot solve certain problems if L eq P, like linear equalities or context-free grammar membership.

03

The high parallelizability of transformers leads to fundamental computational limitations.

Abstract

Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and whose feedforward nets are computable using space linear in their input) can be simulated by constant-depth logspace-uniform threshold circuits. This provides insight on the power of transformers using known results in complexity theory. For example, if $L \neq = P$ (i.e., not all poly-time problems can be solved using logarithmic space), then transformers cannot even accurately solve linear equalities or check membership in an arbitrary context-free grammar with empty productions. Our result intuitively emerges from the transformer architecture's high parallelizability. We thus speculatively introduce the idea of a fundamental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Ferroelectric and Negative Capacitance Devices · Neural Networks and Applications