Higher Layers Need More LoRA Experts

Chongyang Gao; Kezhen Chen; Jinmeng Rao; Baochen Sun and; Ruibo Liu; Daiyi Peng; Yawen Zhang; Xiaoyuan Guo; Jie Yang; VS; Subrahmanian

arXiv:2402.08562·cs.CL·February 14, 2024·3 cites

Higher Layers Need More LoRA Experts

Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun and, Ruibo Liu, Daiyi Peng, Yawen Zhang, Xiaoyuan Guo, Jie Yang, VS, Subrahmanian

PDF

Open Access 1 Repo

TL;DR

This paper introduces MoLA, a layer-wise expert allocation method for LoRA in Transformer models, which improves performance and efficiency by assigning more experts to higher layers, demonstrating superior results on NLP benchmarks.

Contribution

The paper proposes a novel layer-wise expert allocation strategy for LoRA in MoE models, enhancing efficiency and performance in parameter-efficient tuning.

Findings

01

Allocating more LoRA experts to higher layers improves model performance.

02

MoLA outperforms baselines with fewer parameters.

03

Layer-wise expert configuration is effective across NLP benchmarks.

Abstract

Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gcyzsl/mola
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications