Support Tokens, Stability Margins, and a New Foundation for Robust LLMs
Deepak Agarwal, Dhyey Dharmendrakumar Mavani, Suyash Gupta, Karthik Sethuraman, Tejas Dharamsi

TL;DR
This paper reinterprets causal self-attention transformers within a probabilistic framework, revealing stability margins and support tokens, and introduces a Bayesian MAP training method that enhances robustness and margin geometry in LLMs.
Contribution
It provides a probabilistic reformulation of transformers, introduces the concept of support tokens and stability margins, and proposes a minimal modification to training for improved robustness.
Findings
Improved robustness to input perturbations.
Enhanced margin geometry of learned representations.
Maintained out-of-sample accuracy.
Abstract
Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We reinterpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much as classical PCA is extended to probabilistic PCA. This reformulation reveals a key structural consequence of the underlying change of variables: a barrier constraint emerges on the parameters of self-attention. The resulting geometry exposes a degeneracy boundary where the attention-induced mapping becomes locally ill-conditioned, yielding a stability-margin interpretation analogous to the margin in support vector machines. This, in turn, naturally gives rise to the concept of support tokens. We further show that causal transformers define a consistent stochastic process over infinite token sequences, providing a rigorous probabilistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Generative Adversarial Networks and Image Synthesis
