Sensitivity-Positional Co-Localization in GQA Transformers

Manoj Chandrashekar Rao

arXiv:2604.07766·cs.CL·April 10, 2026

Sensitivity-Positional Co-Localization in GQA Transformers

Manoj Chandrashekar Rao

PDF

TL;DR

This paper investigates the relationship between task-sensitive and positional encoding-sensitive layers in GQA transformers, revealing anti-localization and demonstrating improved performance through targeted interventions.

Contribution

It introduces novel metrics and methods to identify and manipulate layers in GQA transformers, challenging the co-localization hypothesis and improving benchmark performance.

Findings

01

Task-sensitive layers are concentrated in the late network layers.

02

RoPE-influential layers dominate early network layers.

03

Targeted interventions outperform alternative configurations by 4-16 percentage points.

Abstract

We investigate a fundamental structural question in Grouped Query Attention (GQA) transformers: do the layers most sensitive to task correctness coincide with the layers where positional encoding adaptation has the greatest leverage? We term this the co-localization hypothesis and test it on Llama 3.1 8B, a 32-layer GQA model with a 4:1 query-to-key-value head ratio. We introduce \LSLORA, which restricts LoRA adaptation to layers identified via a novel correctness-differential hidden-state metric, and GARFA (GQA-Aware RoPE Frequency Adaptation), which attaches 8 learnable per-KV-head scalar multipliers to each targeted layer. Contrary to the co-localization hypothesis, we discover strong anti-localization: task-sensitive layers concentrate in the late network ( $ℓ \in {23 - 31}$ ) while RoPE-influential layers dominate the early network ( $ℓ \in {0 - 9}$ ), yielding Spearman…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.