TL;DR
This paper introduces TAALM, a meta-learning based method that dynamically predicts token importance to improve continual knowledge learning in language models, reducing forgetting and enhancing efficiency.
Contribution
It proposes a novel meta-learning framework for token weighting in CKL and introduces a new benchmark, LAMA-ckl, to better evaluate learning and retention trade-offs.
Findings
TAALM achieves state-of-the-art results on CKL benchmarks.
TAALM is compatible with existing CKL methods, enhancing their performance.
The new LAMA-ckl benchmark reveals insights into learning-retention trade-offs.
Abstract
Previous studies on continual knowledge learning (CKL) in large language models (LLMs) have predominantly focused on approaches such as regularization, architectural modifications, and rehearsal techniques to mitigate catastrophic forgetting. However, these methods naively inherit the inefficiencies of standard training procedures, indiscriminately applying uniform weight across all tokens, which can lead to unnecessary parameter updates and increased forgetting. To address these shortcomings, we propose a novel CKL approach termed Train-Attention-Augmented Language Model (TAALM), which enhances learning efficiency by dynamically predicting and applying weights to tokens based on their usefulness. This method employs a meta-learning framework that optimizes token importance predictions, facilitating targeted knowledge updates and minimizing forgetting. Also, we observe that existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
