Efficient Domain Adaptation of Language Models via Adaptive Tokenization

Vin Sachidananda; Jason S. Kessler; Yi-an Lai

arXiv:2109.07460·cs.CL·September 16, 2021·1 cites

Efficient Domain Adaptation of Language Models via Adaptive Tokenization

Vin Sachidananda, Jason S. Kessler, Yi-an Lai

PDF

Open Access

TL;DR

This paper introduces an adaptive tokenization method for domain adaptation of language models, achieving near domain-specific pretraining performance with less training time and smaller models.

Contribution

It proposes a novel tokenizer adaptation approach based on divergence in token distributions, reducing the need for extensive domain-specific pretraining.

Findings

01

Achieves >97% of domain-specific pretraining performance

02

Produces smaller models with less training and inference time

03

Faster adaptation using tokenizer adjustment compared to full pretraining

Abstract

Contextual embedding-based language models trained on large data sets, such as BERT and RoBERTa, provide strong performance across a wide range of tasks and are ubiquitous in modern NLP. It has been observed that fine-tuning these models on tasks involving data from domains different from that on which they were pretrained can lead to suboptimal performance. Recent work has explored approaches to adapt pretrained language models to new domains by incorporating additional pretraining using domain-specific corpora and task data. We propose an alternative approach for transferring pretrained language models to new domains by adapting their tokenizers. We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora. In datasets from four disparate domains, we find adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Softmax