Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour

TL;DR
This paper introduces a domain-adaptive pre-training method that selectively masks in-domain keywords identified by KeyBERT, improving model performance with reasonable additional computational cost.
Contribution
It presents a novel in-domain pre-training strategy that outperforms random masking and standard pre-training, using keyword-based masking to enhance domain adaptation.
Findings
Outperforms random masking in domain adaptation tasks
Requires only 7-15% additional pre-training time
Effective across multiple datasets and models
Abstract
We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Residual Connection · Adam · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
