A completely uniform transformer for parity

Alexander Kozachinskiy; Tomasz Steifer

arXiv:2501.02535·cs.LG·January 7, 2025

A completely uniform transformer for parity

Alexander Kozachinskiy, Tomasz Steifer

PDF

Open Access

TL;DR

This paper introduces a 3-layer uniform transformer capable of recognizing the parity language without input-length-dependent parameters or positional encoding, advancing the understanding of transformer capabilities.

Contribution

It presents a novel 3-layer transformer architecture that recognizes parity with fixed parameters and no positional encoding, improving upon previous length-dependent models.

Findings

01

Recognizes parity with a 3-layer uniform transformer

02

Eliminates the need for input-length-dependent positional encoding

03

Simplifies transformer architecture for specific language recognition

Abstract

We construct a 3-layer constant-dimension transformer, recognizing the parity language, where neither parameter matrices nor the positional encoding depend on the input length. This improves upon a construction of Chiang and Cholak who use a positional encoding, depending on the input length (but their construction has 2 layers).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological Materials and Phenomena · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing