Transformer Injectivity & Geometric Robustness - Analytic Margins and Bi-Lipschitz Uniformity of Sequence-Level Hidden States
Mikael von Strauss

TL;DR
This paper investigates the injectivity and geometric robustness of Transformer models, providing theoretical conditions for injectivity and empirical diagnostics to assess invertibility and stability of sequence representations.
Contribution
It introduces a formal framework for analyzing injectivity in Transformers, including a dichotomy theorem and geometric diagnostics, with empirical validation on pretrained models.
Findings
Transformers are generically injective in the continuous-parameter limit.
Quantization at 4 bits induces small collisions and reduces geometric robustness.
Layerwise diagnostics reveal stability of representations across training and model scales.
Abstract
Under real-analytic assumptions on decoder-only Transformers, recent work shows that the map from discrete prompts to last-token hidden states is generically injective on finite prompt sets. We refine this picture: for each layer we define a collision discriminant and injective stratum , and prove a dichotomy -- either the model is nowhere injective on the set, or is open and dense and every is injective. Under mild non-singularity assumptions on the optimizer and an absolutely continuous initialization, generic injectivity persists along smooth training trajectories over any fixed horizon. We also treat symmetry groups , showing that discriminants and injective strata descend to the quotient , so injectivity is naturally a property of functional equivalence classes. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Advanced Memory and Neural Computing
