Where Should LoRA Go? Component-Type Placement in Hybrid Language Models

Hector Borobia; Elies Segu\'i-Mas; Guillermina Tormo-Carb\'o

arXiv:2604.22127·cs.CL·April 27, 2026

Where Should LoRA Go? Component-Type Placement in Hybrid Language Models

Hector Borobia, Elies Segu\'i-Mas, Guillermina Tormo-Carb\'o

PDF

TL;DR

This paper investigates how the placement of LoRA adapters in hybrid language models affects performance, revealing that component-specific placement significantly improves adaptation efficiency and transferability.

Contribution

It systematically studies component-type LoRA placement in hybrid models, demonstrating the importance of topology-aware adaptation strategies.

Findings

01

Attention pathway outperforms full-model adaptation with fewer parameters.

02

Recurrent backbone adaptation is harmful in sequential hybrids but beneficial in parallel hybrids.

03

Parallel hybrids show positive transfer; sequential hybrids suffer catastrophic forgetting.

Abstract

Hybrid language models that interleave attention with recurrent components are increasingly competitive with pure Transformers, yet standard LoRA practice applies adapters uniformly without considering the distinct functional roles of each component type. We systematically study component-type LoRA placement across two hybrid architectures -- Qwen3.5-0.8B (sequential, GatedDeltaNet + softmax attention) and Falcon-H1-0.5B (parallel, Mamba-2 SSM + attention) -- fine-tuned on three domains and evaluated on five benchmarks. We find that the attention pathway -- despite being the minority component -- consistently outperforms full-model adaptation with 5-10x fewer trainable parameters. Crucially, adapting the recurrent backbone is destructive in sequential hybrids (-14.8 pp on GSM8K) but constructive in parallel ones (+8.6 pp). We further document a transfer asymmetry: parallel hybrids…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.