Two Is Better Than One: Rotations Scale LoRAs
Hongcan Guo, Guoshun Nan, Yuan Yang, Diyang Zhang, Haotian Li, Zhican Chen, Qinchuan Zhou, Yuhan Ran, Xinye Cao, Sicong Leng, Xiaofeng Tao, and Xudong Jiang

TL;DR
This paper introduces RadarGate, a novel rotational gating mechanism for LoRA-based Mixture-of-Experts in large language models, enhancing expressiveness and scalability by enabling richer feature interactions and better generalization.
Contribution
The paper proposes RadarGate, a geometrically inspired gating method that uses rotations of LoRA representations to improve expressiveness and scalability in LLMs, addressing limitations of existing gating mechanisms.
Findings
RadarGate improves performance across 6 benchmarks and 21 tasks.
Rotations encourage semantic alignment of similar representations.
The method effectively mitigates underfitting and poor generalization issues.
Abstract
Scaling Low-Rank Adaptation (LoRA)-based Mixture-of-Experts (MoE) facilitates large language models (LLMs) to efficiently adapt to diverse tasks. However, traditional gating mechanisms that route inputs to the best experts may fundamentally hinder LLMs' scalability, leading to poor generalization and underfitting issues. We identify that the root cause lies in the restricted expressiveness of existing weighted-sum mechanisms, both within and outside the convex cone of LoRA representations. This motivates us to propose RadarGate, a novel geometrically inspired gating method that introduces rotational operations of LoRAs representations to boost the expressiveness and facilitate richer feature interactions among multiple LoRAs for scalable LLMs. Specifically, we first fuse each LoRA representation to other LoRAs using a learnable component and then feed the output to a rotation matrix.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
