ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT

Rui Pan; Shizhe Diao; Jianlin Chen; Tong Zhang

arXiv:2211.17201·cs.CL·December 1, 2022·1 cites

ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT

Rui Pan, Shizhe Diao, Jianlin Chen, Tong Zhang

PDF

Open Access 1 Repo

TL;DR

ExtremeBERT is a toolkit that significantly accelerates BERT pretraining, making it more accessible and customizable for researchers and industry with limited resources.

Contribution

It introduces a user-friendly toolkit that reduces pretraining time for BERT by over 6 to 9 times, enabling efficient customization on various datasets.

Findings

01

Over 6x faster pretraining for BERT Base

02

Over 9x faster pretraining for BERT Large

03

Achieves comparable or better GLUE scores

Abstract

In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our toolkit is over $6 \times$ times less for BERT Base and $9 \times$ times less for BERT Large when compared with the original BERT paper. The documentation and code are released at https://github.com/extreme-bert/extreme-bert under the Apache-2.0 license.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

extreme-bert/extreme-bert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Layer Normalization