Towards Continual Knowledge Learning of Language Models
Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han,, Gyeonghun Kim, Stanley Jungkyu Choi, Minjoon Seo

TL;DR
This paper introduces a new continual learning framework for language models, focusing on maintaining and updating world knowledge over time, and provides a benchmark to evaluate such models' ability to retain and acquire knowledge.
Contribution
It formulates the Continual Knowledge Learning problem, creates a benchmark and metrics, and demonstrates the challenges and importance of maintaining evolving knowledge in language models.
Findings
CKL presents unique challenges not seen in previous CL setups.
Parameter expansion is crucial for retaining and learning knowledge simultaneously.
CKL highlights critical causes of knowledge forgetting in language models.
Abstract
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus, which is often utilized for performing knowledge-dependent downstream tasks such as question answering, fact-checking, and open dialogue. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes, but it is non-trivial to avoid catastrophic forgetting and reliably acquire new knowledge while preserving invariant knowledge. To push the community towards better maintenance of ever-changing LMs, we formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL). We construct a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge. We adopt applicable recent methods from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
