Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

Valeria Ruscio; Eli-Shaoul Khedouri; Keiran Thompson

arXiv:2605.16600·cs.LG·May 19, 2026

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space

Valeria Ruscio, Eli-Shaoul Khedouri, Keiran Thompson

PDF

TL;DR

This paper investigates the asymmetric effects of pretraining and alignment on transformer weights, revealing distinct geometric traces and their underlying causes through empirical and theoretical analysis.

Contribution

It characterizes the geometric asymmetry in transformer weights caused by pretraining and alignment, explaining it via anisotropic gradient accumulation and providing causal evidence.

Findings

01

Alignment updates concentrate in the read pathway ($W_Q$, $W_K$).

02

Pretraining induces prediction geometry in the write pathway ($W_O$, $W_2$).

03

Gradient anisotropy explains the observed weight-space patterns.

Abstract

Cross-entropy pretraining and preference alignment update the same transformer weights, but leave geometrically distinct traces. We characterise this asymmetry with a relative-subspace-fraction probe that tracks how weight deltas align with residual-stream activation subspaces and with the prediction subspace defined by the unembedding. Alignment deltas concentrate in the read pathway ( $W_{Q}$ , $W_{K}$ ), along principal directions of attention-input activations, while remaining near-isotropic in the write pathway ( $W_{O}$ , $W_{2}$ ) relative to the prediction subspace. We explain this pattern through anisotropic gradient accumulation: updates to a matrix $W$ are sums of outer products $δ_{t} a_{t}^{⊤}$ , and inherit directional structure from whichever side has concentrated covariance. For read-pathway matrices, this side is the input activation $a_{t}$ , whose covariance is spiked in trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.