Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

Pratyush Acharya; Nuraj Rimal; Habish Dhakal

arXiv:2605.01609·cs.LG·May 5, 2026

Concepts Whisper While Syntax Shouts: Spectral Anti-Concentration and the Dual Geometry of Transformer Representations

Pratyush Acharya, Nuraj Rimal, Habish Dhakal

PDF

TL;DR

This paper investigates the spectral geometry of transformer representations, revealing a dual geometry where concepts anti-concentrate in activation space while syntax concentrates in high-variance directions, suggesting semantic content is rotated into spectrally quiet regions.

Contribution

It uncovers a dual spectral geometry in transformer representations and demonstrates how concepts and syntax are differentially represented in spectral subspaces.

Findings

01

Anti-concentration observed in residual difference vectors across architectures.

02

Activation-space concept directions anti-concentrate in the spectral tail.

03

Syntax is encoded in high-variance subspaces in most architectures.

Abstract

We test whether the causal inner product of \citet{park2024linear} -- defined by the unembedding covariance $Σ$ -- enables cross-lingual concept transport. Across 17 models and 4 language pairs, a matched-spectrum randomization test finds that Whitened Causal Alignment is indistinguishable from spectral regularization alone ( $p = 0.95$ ). However, this failure reveals a broader phenomenon: anti-concentration is observed in residual-stream difference-of-means vectors across five architecture families ( $p < 1 0^{- 33}$ ) and supported by SAE features (e.g., $p = 4.5 \times 1 0^{- 19}$ ) and linear probes on Gemma and Llama. We discover a \emph{dual geometry}: activation-space concept directions anti-concentrate in the spectral tail, while static unembedding-row contrasts \emph{concentrate} in high-variance directions ( $p < 1 0^{- 4}$ ). Split-injection causal interventions support the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.