MoELoRA: Contrastive Learning Guided Mixture of Experts on   Parameter-Efficient Fine-Tuning for Large Language Models

Tongxu Luo; Jiahe Lei; Fangyu Lei; Weihao Liu; Shizhu He; Jun Zhao and; Kang Liu

arXiv:2402.12851·cs.CL·February 21, 2024·3 cites

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Tongxu Luo, Jiahe Lei, Fangyu Lei, Weihao Liu, Shizhu He, Jun Zhao and, Kang Liu

PDF

Open Access 1 Repo

TL;DR

MoELoRA introduces a contrastive learning guided mixture of experts approach to parameter-efficient fine-tuning, significantly improving performance on reasoning tasks while reducing computational costs.

Contribution

This work proposes MoELoRA, a novel PEFT method that models LoRA as a Mixture of Experts and uses contrastive learning to enhance expert specialization.

Findings

01

MoELoRA outperforms LoRA by 4.2% on average in math reasoning tasks.

02

MoELoRA achieves competitive results compared to GPT-3.5 with fewer parameters.

03

Contrastive learning reduces expert routing randomness, improving task performance.

Abstract

Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) has emerged as a prominent paradigm in recent research. However, current PEFT approaches that employ a limited set of global parameters (such as LoRA, which adds low-rank approximation matrices to all weights) face challenges in flexibly combining different computational modules in downstream tasks. In this work, we introduce a novel PEFT method: MoELoRA. We consider LoRA as Mixture of Experts (MoE), and to mitigate the random routing phenomenon observed in MoE, we propose the utilization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Leeroo-AI/mergoo
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Sparse Evolutionary Training · Linear Layer · Byte Pair Encoding · Attention Dropout · Dense Connections · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Adam