Knee-Deep in C-RASP: A Transformer Depth Hierarchy

Andy Yang; Micha\"el Cadilhac; David Chiang

arXiv:2506.16055·cs.CL·January 21, 2026

Knee-Deep in C-RASP: A Transformer Depth Hierarchy

Andy Yang, Micha\"el Cadilhac, David Chiang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper establishes a theoretical link between transformer depth and expressiveness, showing deeper transformers are more capable, supported by empirical evidence on sequential tasks.

Contribution

It provides a formal proof connecting transformer depth to increased expressiveness through C-RASP equivalence and demonstrates this relationship empirically.

Findings

01

Deeper transformers are more expressive than shallower ones.

02

Transformers with positional encodings also exhibit increased expressiveness with depth.

03

Empirical results align with the theory on length generalization tasks.

Abstract

It has been observed that transformers with greater depth (that is, more layers) have more capabilities, but can we establish formally which capabilities are gained? We answer this question with a theoretical proof followed by an empirical study. First, we consider transformers that round to fixed precision except inside attention. We show that this subclass of transformers is expressively equivalent to the programming language C-RASP and this equivalence preserves depth. Second, we prove that deeper C-RASP programs are more expressive than shallower C-RASP programs, implying that deeper transformers are more expressive than shallower transformers (within the subclass mentioned above). The same is also proven for transformers with positional encodings (like RoPE and ALiBi). These results are established by studying a temporal logic with counting operators equivalent to C-RASP. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pentagonalize/crasp_depth
pytorchOfficial

Videos

Knee-Deep in C-RASP: A Transformer Depth Hierarchy· slideslive

Taxonomy

TopicsLogic, programming, and type systems · Constraint Satisfaction and Optimization · Formal Methods in Verification