Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability

Jon-Paul Cacioli

arXiv:2604.04469·cs.CL·April 7, 2026

Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability

Jon-Paul Cacioli

PDF

TL;DR

This study shows that transformer language models do not exhibit scalar variability in their magnitude representations, unlike biological systems, despite capturing some geometric properties of magnitude.

Contribution

It provides evidence that distributional learning alone does not produce scalar variability in transformer models' magnitude representations.

Findings

01

Representational variability decreases with magnitude in transformers.

02

Negative scaling exponent observed across models and layers.

03

Corpus frequency strongly predicts per-magnitude variability.

Abstract

Scalar variability -- the finding that representational noise scales proportionally with magnitude, producing a constant coefficient of variation -- is a hallmark of biological magnitude systems. We tested whether transformer language models exhibit this property by analysing the dispersion of hidden-state representations across carrier sentences for 26 numerical magnitudes in three 7-8B parameter models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base; data from Cacioli, 2026). We found the opposite: representational variability decreased with magnitude along the magnitude axis (scaling exponent alpha approx -0.19; 0/16 primary layers with alpha > 0, all three models). The negative sign was consistent in full-dimensional space (alpha approx -0.04) and after sentence-identity correction (alpha approx -0.007). The anti-scalar pattern was 3-5x stronger along the magnitude…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.