LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

Wenbing Li; Zikai Song; Hang Zhou; Yunyao Zhang; Junqing Yu; Wei Yang

arXiv:2507.00029·cs.LG·May 14, 2026

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

Wenbing Li, Zikai Song, Hang Zhou, Yunyao Zhang, Junqing Yu, Wei Yang

PDF

1 Repo 1 Video

TL;DR

LoRA-Mixer introduces a modular, token-level routing framework for multi-task LLM adaptation, improving efficiency and specialization by integrating LoRA experts into attention layers with adaptive routing.

Contribution

It proposes a novel, fine-grained routing method for LoRA experts in attention modules, enhancing multi-task performance and parameter efficiency.

Findings

01

Outperforms state-of-the-art baselines on 15 benchmarks.

02

Uses 48% fewer trainable parameters than competitors.

03

Achieves significant accuracy improvements on GSM8K, CoLA, and ARC-C.

Abstract

Recent attempts to combine low-rank adaptation (LoRA) with mixture-of-experts (MoE) for multi-task adaptation of Large Language Models (LLMs) often replace whole attention/FFN layers with switch experts or append parallel expert branches, undermining parameter efficiency and limiting task specialization. We introduce LoRA-Mixer, a modular MoE framework that routes task-specific LoRA experts into the core projection matrices of the attention module, namely input and output linear layers, rather than primarily targeting FFN blocks. The design delivers fine-grained token-level specialization by fully exploiting the attention mechanism, while remaining drop-in compatible with Transformers and state-space models (SSMs), since linear projection layers are ubiquitous. To train robust routers from limited data while promoting stable, selective decisions and high expert reuse, LoRA-Mixer employs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hustcselwb/LoRA-Mixer
github

Videos

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing· slideslive