F-MALLOC: Feed-forward Memory Allocation for Continual Learning in   Neural Machine Translation

Junhong Wu; Yuchen Liu; Chengqing Zong

arXiv:2404.04846·cs.CL·October 23, 2024·1 cites

F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation

Junhong Wu, Yuchen Liu, Chengqing Zong

PDF

Open Access 1 Repo 1 Video

TL;DR

F-MALLOC is a novel continual learning method for neural machine translation that decomposes feed-forward layers into memory units, effectively reducing catastrophic forgetting and improving translation quality across tasks.

Contribution

The paper introduces F-MALLOC, a feed-forward memory allocation approach that enhances continual learning in NMT by safeguarding task-specific memories and ensuring system extensibility.

Findings

01

F-MALLOC achieves higher BLEU scores than baseline methods.

02

F-MALLOC exhibits near-zero catastrophic forgetting.

03

The proposed protocol effectively evaluates multi-stage continual learning.

Abstract

In the evolving landscape of Neural Machine Translation (NMT), the pretrain-then-finetune paradigm has yielded impressive results. However, the persistent challenge of Catastrophic Forgetting (CF) remains a hurdle. While previous work has introduced Continual Learning (CL) methods to address CF, these approaches grapple with the delicate balance between avoiding forgetting and maintaining system extensibility. To address this, we propose a CL method, named $F-MALLOC$ ( $F$ eed-forward $M$ emory $ALLOC a t i o n)$ . F-MALLOC is inspired by recent insights highlighting that feed-forward layers emulate neural memories and encapsulate crucial translation knowledge. It decomposes feed-forward layers into discrete memory cells and allocates these memories to different tasks. By learning to allocate and safeguard these memories, our method effectively alleviates CF…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wjmacro/continualmt
pytorchOfficial

Videos

F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis