TL;DR
LoRA-Mixer introduces a modular, token-level routing framework for multi-task LLM adaptation, improving efficiency and specialization by integrating LoRA experts into attention layers with adaptive routing.
Contribution
It proposes a novel, fine-grained routing method for LoRA experts in attention modules, enhancing multi-task performance and parameter efficiency.
Findings
Outperforms state-of-the-art baselines on 15 benchmarks.
Uses 48% fewer trainable parameters than competitors.
Achieves significant accuracy improvements on GSM8K, CoLA, and ARC-C.
Abstract
Recent attempts to combine low-rank adaptation (LoRA) with mixture-of-experts (MoE) for multi-task adaptation of Large Language Models (LLMs) often replace whole attention/FFN layers with switch experts or append parallel expert branches, undermining parameter efficiency and limiting task specialization. We introduce LoRA-Mixer, a modular MoE framework that routes task-specific LoRA experts into the core projection matrices of the attention module, namely input and output linear layers, rather than primarily targeting FFN blocks. The design delivers fine-grained token-level specialization by fully exploiting the attention mechanism, while remaining drop-in compatible with Transformers and state-space models (SSMs), since linear projection layers are ubiquitous. To train robust routers from limited data while promoting stable, selective decisions and high expert reuse, LoRA-Mixer employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
