Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B

Abay Bektursun

arXiv:2605.00333·cs.LG·May 20, 2026

Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B

Abay Bektursun

PDF

TL;DR

This paper investigates how frozen pretrained language model weights transfer to non-text tasks, identifying specific attention heads with cross-distribution importance fingerprints and causal roles.

Contribution

It introduces a cross-distribution importance fingerprint and provides causal validation of specific heads' roles in non-language tasks.

Findings

01

Certain attention heads are highly influential in non-language tasks.

02

Head L26.28 significantly impacts the cube-double-play-task1 performance.

03

Head ablation results demonstrate causal importance of specific heads.

Abstract

Frozen Gemma 4 31B weights pretrained exclusively on text, unmodified, transfer through a thin trainable interface to non-text modalities the substrate has never processed. On the L24--L29 slice (192 attention heads), an English-text TxtCopy attention probe (95 sentences) and per-head ablation impact on four non-language token-pattern tasks (binary copy, associative recall, 1D cellular automaton Rule 90, binary addition) jointly classify four heads -- L26.28, L27.28, L27.2, L27.3 -- as top-tier on both signals. The slice-level joint coincidence is significant under hypergeometric null ( $P = 0.0013$ , $N = 192$ , $K = 38$ , $n = 4$ ) and survives multiplicity-aware permutation tests ( $P_{V 4} = 0.013$ ). Pretrained Gemma L26 reaches 60.22% on OGBench cube-double-play-task1 vs ~1% for random-init Gemma ( $+ 59$ pt at $n = 3$ ); a FrozenRandom-GPT2 control with correct $1/ d_{k}$ scaling also fails.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.