Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures

Hector Borobia; Elies Segu\'i-Mas; Guillermina Tormo-Carb\'o

arXiv:2603.22473·cs.CL·March 25, 2026

Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures

Hector Borobia, Elies Segu\'i-Mas, Guillermina Tormo-Carb\'o

PDF

Open Access

TL;DR

This study investigates the functional roles of components in hybrid language models, revealing that both attention and state space modules are essential, with the linear attention backbone being primary, and early layers being most critical.

Contribution

The paper introduces a functional ablation framework for hybrid language models, demonstrating the importance of both components and their positional significance, with implications for model design and robustness.

Findings

01

Both attention and SSM/linear attention components are essential.

02

Linear attention or SSM forms the primary backbone, with removal causing >35,000x perplexity increase.

03

Early layers are disproportionately critical, with importance decreasing in later layers.

Abstract

Hybrid language models combining attention with state space models (SSMs) or linear attention offer improved efficiency, but whether both components are genuinely utilized remains unclear. We present a functional component ablation framework applied to two sub-1B hybrid models -- Qwen3.5-0.8B (sequential: Gated DeltaNet + softmax attention) and Falcon-H1-0.5B (parallel: Mamba-2 + attention) -- with a pure Transformer control (Qwen2.5-0.5B). Through group ablations, layer-wise sweeps, positional ablations, matched random controls, and perplexity analysis across five benchmarks, we establish four findings: (1) both component types are essential and neither is bypassed; (2) the alternative component (linear attention or SSM) is the primary language modeling backbone, causing >35,000x perplexity degradation when removed versus ~82x for attention; (3) component importance follows a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software System Performance and Reliability · Adversarial Robustness in Machine Learning