Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions
Shuhao Gu, Bojie Hu, Yang Feng

TL;DR
This paper introduces a two-stage training approach for neural machine translation that identifies low forgetting risk regions to effectively adapt to new tasks without catastrophic forgetting.
Contribution
It proposes a novel two-stage training method that searches for low forgetting risk regions based on loss curvature and parameter impact, improving continual learning performance.
Findings
Significant improvements over strong baselines in domain and language adaptation tasks.
Effective avoidance of catastrophic forgetting without access to previous training data.
Enhanced model retention and adaptation capabilities in continual learning scenarios.
Abstract
This paper considers continual learning of large-scale pretrained neural machine translation model without accessing the previous training data or introducing model separation. We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem and cannot always achieve a good balance between the previous and new tasks. To solve the problem, we propose a two-stage training method based on the local features of the real loss. We first search low forgetting risk regions, where the model can retain the performance on the previous task as the parameters are updated, to avoid the catastrophic forgetting problem. Then we can continually train the model within this region only with the new training data to fit the new task. Specifically, we propose two methods to search the low forgetting risk regions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research
