MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models
Tongxu Luo, Jiahe Lei, Fangyu Lei, Weihao Liu, Shizhu He, Jun Zhao and, Kang Liu

TL;DR
MoELoRA introduces a contrastive learning guided mixture of experts approach to parameter-efficient fine-tuning, significantly improving performance on reasoning tasks while reducing computational costs.
Contribution
This work proposes MoELoRA, a novel PEFT method that models LoRA as a Mixture of Experts and uses contrastive learning to enhance expert specialization.
Findings
MoELoRA outperforms LoRA by 4.2% on average in math reasoning tasks.
MoELoRA achieves competitive results compared to GPT-3.5 with fewer parameters.
Contrastive learning reduces expert routing randomness, improving task performance.
Abstract
Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) has emerged as a prominent paradigm in recent research. However, current PEFT approaches that employ a limited set of global parameters (such as LoRA, which adds low-rank approximation matrices to all weights) face challenges in flexibly combining different computational modules in downstream tasks. In this work, we introduce a novel PEFT method: MoELoRA. We consider LoRA as Mixture of Experts (MoE), and to mitigate the random routing phenomenon observed in MoE, we propose the utilization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Sparse Evolutionary Training · Linear Layer · Byte Pair Encoding · Attention Dropout · Dense Connections · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Adam
