Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer
Ay\c{s}e Selin Okatan, Mustafa \.Ilhan Akba\c{s}, Laxima Niure Kandel, Berker Pek\"oz

TL;DR
This paper reveals that subliminal trait transfer in Transformer models depends on subspace alignment rather than global similarity, and proposes diagnostics and controls to enhance model security.
Contribution
It introduces the concept of seed-induced subspace alignment governing subliminal transfer and develops subspace-aware diagnostics to improve model security.
Findings
Transfer strength depends on trait-discriminative subspace alignment.
Global CKA does not fully capture subliminal transfer; subspace alignment is key.
Security controls can reduce leakage without harming main task performance.
Abstract
We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often attributes transferability to global representational similarity, typically quantified with Centered Kernel Alignment (CKA). Using synthetic corpora with disentangled public and private labels, we distill students under matched and independent random initializations. We find that transfer strength hinges on alignment within a trait-discriminative subspace: same-seed students inherit this alignment and show higher leakage {\tau \approx} 0.24, whereas different-seed students -- despite global CKA > 0.9 -- exhibit substantially reduced excess accuracy {\tau \approx} 0.12 - 0.13. We formalize this with subspace-level CKA diagnostic and residualized probes, showing that leakage tracks alignment within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
