Learning Attentional Mixture of LoRAs for Language Model Continual Learning
Jialin Liu, Jianhua Wu, Jie Liu, Yutai Duan

TL;DR
This paper introduces AM-LoRA, an attention-based method for continual learning with large language models that adaptively combines multiple LoRAs, reducing catastrophic forgetting and interference.
Contribution
It proposes an attention mechanism to dynamically integrate LoRAs and employs sparsity constraints to enhance continual learning in LLMs.
Findings
AM-LoRA outperforms existing methods on benchmarks.
The attention mechanism effectively mitigates interference.
Sparse attention improves task-specific knowledge retention.
Abstract
Fine-tuning large language models (LLMs) with Low-Rank adaption (LoRA) is widely acknowledged as an effective approach for continual learning for new tasks. However, it often suffers from catastrophic forgetting when dealing with multiple tasks sequentially. To this end, we propose Attentional Mixture of LoRAs (AM-LoRA), a continual learning approach tailored for LLMs. Specifically, AM-LoRA learns a sequence of LoRAs for a series of tasks to continually learn knowledge from different tasks. The key of our approach is that we devise an attention mechanism as a knowledge mixture module to adaptively integrate information from each LoRA. With the attention mechanism, AM-LoRA can efficiently leverage the distinctive contributions of each LoRA, while mitigating the risk of mutually negative interactions among them that may lead to catastrophic forgetting. Moreover, we further introduce …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling
MethodsSoftmax · Attention Is All You Need
