Investigating Continual Pretraining in Large Language Models: Insights   and Implications

\c{C}a\u{g}atay Y{\i}ld{\i}z; Nishaanth Kanna Ravichandran; Nitin; Sharma; Matthias Bethge; Beyza Ermis

arXiv:2402.17400·cs.CL·February 13, 2025·3 cites

Investigating Continual Pretraining in Large Language Models: Insights and Implications

\c{C}a\u{g}atay Y{\i}ld{\i}z, Nishaanth Kanna Ravichandran, Nitin, Sharma, Matthias Bethge, Beyza Ermis

PDF

Open Access

TL;DR

This paper explores continual pretraining in large language models, introducing a new benchmark to evaluate adaptability, and finds that continual pretraining improves performance, especially in larger models and when domain sequences are semantically similar.

Contribution

It introduces a novel benchmark for continual learning in LLMs and provides comprehensive insights into how model size and domain similarity affect continual pretraining effectiveness.

Findings

01

Continual pretraining improves models under 1.5B parameters.

02

Larger models outperform smaller ones in perplexity after continual pretraining.

03

Smaller models are more sensitive to learning and forgetting during continual pretraining.

Abstract

Continual learning (CL) in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies to adapt models to emerging knowledge and achieve robustness in dynamic environments. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge. Since existing works concentrate mostly on continual fine-tuning for a limited selection of downstream tasks or training domains, we introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes. We further examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems

MethodsFocus