Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

Mikael von Strauss

arXiv:2511.14808·cs.LG·November 20, 2025

Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States

Mikael von Strauss

PDF

Open Access

TL;DR

This paper investigates the injectivity and geometric robustness of Transformer models, providing theoretical conditions for injectivity and empirical diagnostics to assess invertibility and stability of sequence representations.

Contribution

It introduces a formal framework for analyzing injectivity in Transformers, including a dichotomy theorem and geometric diagnostics, with empirical validation on pretrained models.

Findings

01

Transformers are generically injective in the continuous-parameter limit.

02

Quantization at 4 bits induces small collisions and reduces geometric robustness.

03

Layerwise diagnostics reveal stability of representations across training and model scales.

Abstract

Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer $ℓ$ we define a collision discriminant $Δ^{ℓ} \subset Θ$ and injective stratum $U^{ℓ} = Θ ∖ Δ^{ℓ}$ , and prove a dichotomy -- either the model is nowhere injective on the set, or $U^{ℓ}$ is open and dense and every $F_{θ}^{ℓ}$ is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups $G$ , showing that discriminants and injective strata descend to the quotient $Θ/ G$ , so injectivity is naturally a property of functional equivalence classes. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Memory and Neural Computing