MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

Shu Yang; Muhammad Asif Ali; Cheng-Long Wang; Lijie Hu; and Di Wang

arXiv:2402.11260·cs.CL·February 20, 2024·1 cites

MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning

Shu Yang, Muhammad Asif Ali, Cheng-Long Wang, Lijie Hu, and Di Wang

PDF

Open Access

TL;DR

MoRAL introduces a novel approach combining Mixture-of-Experts and Low-Rank Adaptation to enable large language models to learn continuously from question-answer pairs, improving efficiency, robustness, and knowledge retention.

Contribution

The paper proposes MoRAL, a new method integrating MoE and LoRA for lifelong learning of LLMs using simple QA pairs, along with a new benchmark and evaluation metrics.

Findings

01

LLMs learn faster in open-book settings with up to 30.15% improvement.

02

MoRAL performs better with larger models.

03

MoRAL demonstrates robustness against catastrophic forgetting.

Abstract

Adapting large language models (LLMs) to new domains/tasks and enabling them to be efficient lifelong learners is a pivotal challenge. In this paper, we propose MoRAL, i.e., Mixture-of-Experts augmented Low-Rank Adaptation for Lifelong Learning. MoRAL combines the multi-tasking abilities of MoE with the fine-tuning abilities of LoRA for effective life-long learning of LLMs. In contrast to the conventional approaches that use factual triplets as inputs MoRAL relies on simple question-answer pairs, which is a more practical and effective strategy for robust and efficient learning. Owing to new data settings, we introduce a new evaluation benchmark namely: Life Long Learning of LLM (5L-bench) encompassing a newly curated dataset of question-answer pairs, and a set of evaluation metrics for rigorous evaluation of MoRAL in open-book and closed-book settings. Experimental evaluation shows (i)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Semantic Web and Ontologies · Advanced Data Processing Techniques

MethodsSparse Evolutionary Training