Continual Training of Language Models for Few-Shot Learning
Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, and Bing Liu

TL;DR
This paper introduces CPT, a continual post-training system that incrementally adapts large language models with unlabeled domain data to enhance few-shot learning without forgetting prior knowledge.
Contribution
It proposes the first continual post-training method for language models, enabling incremental domain adaptation while preserving previous skills.
Findings
CPT improves few-shot learning performance across multiple domains.
Continual post-training maintains model knowledge without degradation.
Experimental results show significant performance gains.
Abstract
Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
