TL;DR
TalkLoRA introduces a communication-aware framework for MoE-based LoRA that improves routing stability and performance in large language model fine-tuning.
Contribution
It proposes expert-level communication prior via a Talking Module, enhancing robustness and generalization over existing MoELoRA methods.
Findings
Outperforms vanilla LoRA and MoELoRA on multiple language tasks.
Achieves higher parameter efficiency and balanced expert routing.
Theoretically demonstrates smoothing of routing dynamics through expert communication.
Abstract
Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning of Large Language Models (LLMs), and recent Mixture-of-Experts (MoE) extensions further enhance flexibility by dynamically combining multiple LoRA experts. However, existing MoE-augmented LoRA methods assume that experts operate independently, often leading to unstable routing, expert dominance. In this paper, we propose \textbf{TalkLoRA}, a communication-aware MoELoRA framework that relaxes this independence assumption by introducing expert-level communication prior to routing. TalkLoRA equips low-rank experts with a lightweight Talking Module that enables controlled information exchange across expert subspaces, producing a more robust global signal for routing. Theoretically, we show that expert communication smooths routing dynamics by mitigating perturbation amplification while strictly generalizing existing MoELoRA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
