A Compact Pretraining Approach for Neural Language Models
Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour

TL;DR
This paper introduces a compact pretraining method for neural language models that uses domain-specific summaries and keywords to improve efficiency and effectiveness, reducing pretraining time and enhancing downstream task performance.
Contribution
The study proposes a novel approach to domain adaptation by constructing compact data subsets using summaries and keywords, leading to faster and more effective pretraining of neural language models.
Findings
Pretraining on compact subsets improves classifier performance.
The method reduces pretraining time by up to five times.
Models pretrained with this approach outperform traditional methods.
Abstract
Domain adaptation for large neural language models (NLMs) is coupled with massive amounts of unstructured data in the pretraining phase. In this study, however, we show that pretrained NLMs learn in-domain information more effectively and faster from a compact subset of the data that focuses on the key information in the domain. We construct these compact subsets from the unstructured data using a combination of abstractive summaries and extractive keywords. In particular, we rely on BART to generate abstractive summaries, and KeyBERT to extract keywords from these summaries (or the original unstructured text directly). We evaluate our approach using six different settings: three datasets combined with two distinct NLMs. Our results reveal that the task-specific classifiers trained on top of NLMs pretrained using our method outperform methods based on traditional pretraining, i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsAttention Is All You Need · Linear Layer · Softmax · Layer Normalization · Adam · Byte Pair Encoding · Dense Connections · Dropout · Residual Connection · Multi-Head Attention
