F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation
Junhong Wu, Yuchen Liu, Chengqing Zong

TL;DR
F-MALLOC is a novel continual learning method for neural machine translation that decomposes feed-forward layers into memory units, effectively reducing catastrophic forgetting and improving translation quality across tasks.
Contribution
The paper introduces F-MALLOC, a feed-forward memory allocation approach that enhances continual learning in NMT by safeguarding task-specific memories and ensuring system extensibility.
Findings
F-MALLOC achieves higher BLEU scores than baseline methods.
F-MALLOC exhibits near-zero catastrophic forgetting.
The proposed protocol effectively evaluates multi-stage continual learning.
Abstract
In the evolving landscape of Neural Machine Translation (NMT), the pretrain-then-finetune paradigm has yielded impressive results. However, the persistent challenge of Catastrophic Forgetting (CF) remains a hurdle. While previous work has introduced Continual Learning (CL) methods to address CF, these approaches grapple with the delicate balance between avoiding forgetting and maintaining system extensibility. To address this, we propose a CL method, named (eed-forward emory . F-MALLOC is inspired by recent insights highlighting that feed-forward layers emulate neural memories and encapsulate crucial translation knowledge. It decomposes feed-forward layers into discrete memory cells and allocates these memories to different tasks. By learning to allocate and safeguard these memories, our method effectively alleviates CF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
