Efficient Domain Adaptation of Language Models via Adaptive Tokenization
Vin Sachidananda, Jason S. Kessler, Yi-an Lai

TL;DR
This paper introduces an adaptive tokenization method for domain adaptation of language models, achieving near domain-specific pretraining performance with less training time and smaller models.
Contribution
It proposes a novel tokenizer adaptation approach based on divergence in token distributions, reducing the need for extensive domain-specific pretraining.
Findings
Achieves >97% of domain-specific pretraining performance
Produces smaller models with less training and inference time
Faster adaptation using tokenizer adjustment compared to full pretraining
Abstract
Contextual embedding-based language models trained on large data sets, such as BERT and RoBERTa, provide strong performance across a wide range of tasks and are ubiquitous in modern NLP. It has been observed that fine-tuning these models on tasks involving data from domains different from that on which they were pretrained can lead to suboptimal performance. Recent work has explored approaches to adapt pretrained language models to new domains by incorporating additional pretraining using domain-specific corpora and task data. We propose an alternative approach for transferring pretrained language models to new domains by adapting their tokenizers. We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora. In datasets from four disparate domains, we find adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Softmax
