Learning Attentional Mixture of LoRAs for Language Model Continual   Learning

Jialin Liu; Jianhua Wu; Jie Liu; Yutai Duan

arXiv:2409.19611·cs.CL·October 1, 2024

Learning Attentional Mixture of LoRAs for Language Model Continual Learning

Jialin Liu, Jianhua Wu, Jie Liu, Yutai Duan

PDF

Open Access

TL;DR

This paper introduces AM-LoRA, an attention-based method for continual learning with large language models that adaptively combines multiple LoRAs, reducing catastrophic forgetting and interference.

Contribution

It proposes an attention mechanism to dynamically integrate LoRAs and employs sparsity constraints to enhance continual learning in LLMs.

Findings

01

AM-LoRA outperforms existing methods on benchmarks.

02

The attention mechanism effectively mitigates interference.

03

Sparse attention improves task-specific knowledge retention.

Abstract

Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks. However, it often suffers from catastrophic forgetting when dealing with multiple tasks sequentially. To this end, we propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs. Specifically, AM-LoRA learns a sequence of LoRAs for a series of tasks to continually learn knowledge from different tasks. The key of our approach is that we devise an attention mechanism as a knowledge mixture module to adaptively integrate information from each LoRA. With the attention mechanism, AM-LoRA can efficiently leverage the distinctive contributions of each LoRA, while mitigating the risk of mutually negative interactions among them that may lead to catastrophic forgetting. Moreover, we further introduce $L 1$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling

MethodsSoftmax · Attention Is All You Need