Two Is Better Than One: Rotations Scale LoRAs

Hongcan Guo; Guoshun Nan; Yuan Yang; Diyang Zhang; Haotian Li; Zhican Chen; Qinchuan Zhou; Yuhan Ran; Xinye Cao; Sicong Leng; Xiaofeng Tao; and Xudong Jiang

arXiv:2505.23184·cs.LG·May 30, 2025

Two Is Better Than One: Rotations Scale LoRAs

Hongcan Guo, Guoshun Nan, Yuan Yang, Diyang Zhang, Haotian Li, Zhican Chen, Qinchuan Zhou, Yuhan Ran, Xinye Cao, Sicong Leng, Xiaofeng Tao, and Xudong Jiang

PDF

TL;DR

This paper introduces RadarGate, a novel rotational gating mechanism for LoRA-based Mixture-of-Experts in large language models, enhancing expressiveness and scalability by enabling richer feature interactions and better generalization.

Contribution

The paper proposes RadarGate, a geometrically inspired gating method that uses rotations of LoRA representations to improve expressiveness and scalability in LLMs, addressing limitations of existing gating mechanisms.

Findings

01

RadarGate improves performance across 6 benchmarks and 21 tasks.

02

Rotations encourage semantic alignment of similar representations.

03

The method effectively mitigates underfitting and poor generalization issues.

Abstract

Scaling Low-Rank Adaptation (LoRA)-based Mixture-of-Experts (MoE) facilitates large language models (LLMs) to efficiently adapt to diverse tasks. However, traditional gating mechanisms that route inputs to the best experts may fundamentally hinder LLMs' scalability, leading to poor generalization and underfitting issues. We identify that the root cause lies in the restricted expressiveness of existing weighted-sum mechanisms, both within and outside the convex cone of LoRA representations. This motivates us to propose RadarGate, a novel geometrically inspired gating method that introduces rotational operations of LoRAs representations to boost the expressiveness and facilitate richer feature interactions among multiple LoRAs for scalable LLMs. Specifically, we first fuse each LoRA representation to other LoRAs using a learnable component and then feed the output to a rotation matrix.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.