A completely uniform transformer for parity
Alexander Kozachinskiy, Tomasz Steifer

TL;DR
This paper introduces a 3-layer uniform transformer capable of recognizing the parity language without input-length-dependent parameters or positional encoding, advancing the understanding of transformer capabilities.
Contribution
It presents a novel 3-layer transformer architecture that recognizes parity with fixed parameters and no positional encoding, improving upon previous length-dependent models.
Findings
Recognizes parity with a 3-layer uniform transformer
Eliminates the need for input-length-dependent positional encoding
Simplifies transformer architecture for specific language recognition
Abstract
We construct a 3-layer constant-dimension transformer, recognizing the parity language, where neither parameter matrices nor the positional encoding depend on the input length. This improves upon a construction of Chiang and Cholak who use a positional encoding, depending on the input length (but their construction has 2 layers).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological Materials and Phenomena · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing
