Investigating Continual Pretraining in Large Language Models: Insights and Implications
\c{C}a\u{g}atay Y{\i}ld{\i}z, Nishaanth Kanna Ravichandran, Nitin, Sharma, Matthias Bethge, Beyza Ermis

TL;DR
This paper explores continual pretraining in large language models, introducing a new benchmark to evaluate adaptability, and finds that continual pretraining improves performance, especially in larger models and when domain sequences are semantically similar.
Contribution
It introduces a novel benchmark for continual learning in LLMs and provides comprehensive insights into how model size and domain similarity affect continual pretraining effectiveness.
Findings
Continual pretraining improves models under 1.5B parameters.
Larger models outperform smaller ones in perplexity after continual pretraining.
Smaller models are more sensitive to learning and forgetting during continual pretraining.
Abstract
Continual learning (CL) in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies to adapt models to emerging knowledge and achieve robustness in dynamic environments. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge. Since existing works concentrate mostly on continual fine-tuning for a limited selection of downstream tasks or training domains, we introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes. We further examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsFocus
