Keys in the Weights: Transformer Authentication Using Model-Bound Latent Representations
Ay\c{s}e S. Okatan, Mustafa \.Ilhan Akba\c{s}, Laxima Niure Kandel, Berker Pek\"oz

TL;DR
This paper presents MoBLE, a novel decoder-binding property in Transformer autoencoders that enables model-based authentication through latent representations, facilitating secure AI deployment without secret injection.
Contribution
It introduces ZSDN as a formal decoder-binding metric and demonstrates its application for model authentication and access control in Transformer models.
Findings
Self-decoding achieves over 0.91 exact match
Zero-shot cross-decoding drops to chance levels
Weight-space and attention diagnostics support binding hypothesis
Abstract
We introduce Model-Bound Latent Exchange (MoBLE), a decoder-binding property in Transformer autoencoders formalized as Zero-Shot Decoder Non-Transferability (ZSDN). In identity tasks using iso-architectural models trained on identical data but differing in seeds, self-decoding achieves more than 0.91 exact match and 0.98 token accuracy, while zero-shot cross-decoding collapses to chance without exact matches. This separation arises without injected secrets or adversarial training, and is corroborated by weight-space distances and attention-divergence diagnostics. We interpret ZSDN as model binding, a latent-based authentication and access-control mechanism, even when the architecture and training recipe are public: encoder's hidden state representation deterministically reveals the plaintext, yet only the correctly keyed decoder reproduces it in zero-shot. We formally define ZSDN, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
