Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer

Ay\c{s}e Selin Okatan; Mustafa \.Ilhan Akba\c{s}; Laxima Niure Kandel; Berker Pek\"oz

arXiv:2511.01023·eess.SP·February 17, 2026

Seed-Induced Uniqueness in Transformer Models: Subspace Alignment Governs Subliminal Transfer

Ay\c{s}e Selin Okatan, Mustafa \.Ilhan Akba\c{s}, Laxima Niure Kandel, Berker Pek\"oz

PDF

TL;DR

This paper reveals that subliminal trait transfer in Transformer models depends on subspace alignment rather than global similarity, and proposes diagnostics and controls to enhance model security.

Contribution

It introduces the concept of seed-induced subspace alignment governing subliminal transfer and develops subspace-aware diagnostics to improve model security.

Findings

01

Transfer strength depends on trait-discriminative subspace alignment.

02

Global CKA does not fully capture subliminal transfer; subspace alignment is key.

03

Security controls can reduce leakage without harming main task performance.

Abstract

We analyze subliminal transfer in Transformer models, where a teacher embeds hidden traits that can be linearly decoded by a student without degrading main-task performance. Prior work often attributes transferability to global representational similarity, typically quantified with Centered Kernel Alignment (CKA). Using synthetic corpora with disentangled public and private labels, we distill students under matched and independent random initializations. We find that transfer strength hinges on alignment within a trait-discriminative subspace: same-seed students inherit this alignment and show higher leakage {\tau \approx} 0.24, whereas different-seed students -- despite global CKA > 0.9 -- exhibit substantially reduced excess accuracy {\tau \approx} 0.12 - 0.13. We formalize this with subspace-level CKA diagnostic and residualized probes, showing that leakage tracks alignment within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.