Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure
Nilesh Sarkar, Dawar Jyoti Deka

TL;DR
This paper introduces a new measure called causal dimensionality to quantify the causal influence of transformer representations, demonstrating its invariance to model scaling and its structural variation across layers.
Contribution
The paper defines and empirically validates causal dimensionality as a model-intrinsic property of transformer layers, measurable via SAE width and invariant to scaling.
Findings
Causal dimensionality remains constant across different model sizes.
Representational capacity grows faster than causal capacity with SAE width.
Causal dimensionality is invariant across network depths and model scales.
Abstract
Sparse autoencoders (SAEs) decompose transformer residual streams into interpretable feature dictionaries, yet the relationship between SAE width and causal influence on model output has not been systematically characterised. We introduce causal dimensionality kappa(L, M, T), defined as the effective rank of the expected Jacobian outer product at layer L, and show it can be estimated via the SAE width sweep paired with attribution patching. Across seven SAE widths from 16,384 to 1,048,576 features on Gemma-2-2B layer 12, representational capacity grows 15.6x while causal capacity grows only 4.35x: a robust separation we term the representational-causal wedge. A saturating fit yields kappa-hat approximately 1,990 with kappa-hat / d_model = 0.86 and participation-ratio lower bound kappa_PR approximately 280. Crucially, kappa is invariant to model scaling: Gemma-2-9B and Gemma-2-2B yield…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
