Ehrenfeucht-Haussler Rank and Chain of Thought
Pablo Barcel\'o, Alexander Kozachinskiy, Tomasz Steifer

TL;DR
This paper introduces a new characterization of Boolean function rank based on Transformer Chain of Thought steps, establishing bounds and implications for PAC learning and multi-head transformer models.
Contribution
It provides a novel Transformer-based rank characterization, tight bounds on CoT steps for specific functions, and analyzes PAC-learnability of functions with bounded multi-head rank.
Findings
Rank corresponds to minimum CoT steps in single-layer Transformers.
Exact CoT steps needed for function composition and position-finding problems.
Analysis of PAC-learnability for functions with bounded multi-head rank.
Abstract
The notion of \emph{rank} of a Boolean function has been a cornerstone in PAC learning theory, enabling quasipolynomial-time learning algorithms for polynomial-size decision trees. We present a novel characterization of rank, grounded in the well-known Transformer architecture. We show that the rank of a function corresponds to the minimum number of \emph{Chain of Thought} (CoT) steps required by a single-layer Transformer with hard attention to compute . Based on this characterization we establish tight bounds on the number of CoT steps required for specific problems, showing that \(\ell\)-fold function composition necessitates exactly \(\ell\) CoT steps. Furthermore, we analyze the problem of identifying the position of the \(k\)-th occurrence of 1 in a Boolean sequence, proving that it requires \(k\) CoT steps. Finally, we introduce the notion of the multi-head rank that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhilosophy, Science, and History
MethodsAttention Is All You Need · Adam · Softmax · Absolute Position Encodings · Residual Connection · Dropout · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
