Continual Training of Language Models for Few-Shot Learning

Zixuan Ke; Haowei Lin; Yijia Shao; Hu Xu; Lei Shu; and Bing Liu

arXiv:2210.05549·cs.CL·October 12, 2022

Continual Training of Language Models for Few-Shot Learning

Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, and Bing Liu

PDF

Open Access 3 Repos 1 Models

TL;DR

This paper introduces CPT, a continual post-training system that incrementally adapts large language models with unlabeled domain data to enhance few-shot learning without forgetting prior knowledge.

Contribution

It proposes the first continual post-training method for language models, enabling incremental domain adaptation while preserving previous skills.

Findings

01

CPT improves few-shot learning performance across multiple domains.

02

Continual post-training maintains model knowledge without degradation.

03

Experimental results show significant performance gains.

Abstract

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. The goal is to improve the few-shot end-task learning in these domains. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
UIC-Liu-Lab/CPT
model· 31 dl· ♡ 3
31 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications