Some Attention is All You Need for Retrieval

Felix Michalak; Steven Abreu

arXiv:2510.19861·cs.LG·October 24, 2025

Some Attention is All You Need for Retrieval

Felix Michalak, Steven Abreu

PDF

Open Access

TL;DR

This paper shows that in hybrid SSM-Transformer models, retrieval relies solely on self-attention layers, and sparsifying attention can maintain performance, revealing functional specialization within the architecture.

Contribution

It demonstrates complete segregation of retrieval function to self-attention layers and identifies mechanistic requirements for retrieval in hybrid models.

Findings

01

Retrieval depends exclusively on self-attention layers.

02

Sparsifying attention to 15% of heads retains near-perfect retrieval.

03

Hybrid models operate as specialized modules rather than integrated systems.

Abstract

We demonstrate complete functional segregation in hybrid SSM-Transformer architectures: retrieval depends exclusively on self-attention layers. Across RecurrentGemma-2B/9B and Jamba-Mini-1.6, attention ablation causes catastrophic retrieval failure (0% accuracy), while SSM layers show no compensatory mechanisms even with improved prompting. Conversely, sparsifying attention to just 15% of heads maintains near-perfect retrieval while preserving 84% MMLU performance, suggesting self-attention specializes primarily for retrieval tasks. We identify precise mechanistic requirements for retrieval: needle tokens must be exposed during generation and sufficient context must be available during prefill or generation. This strict functional specialization challenges assumptions about redundancy in hybrid architectures and suggests these models operate as specialized modules rather than integrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques