ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT
Rui Pan, Shizhe Diao, Jianlin Chen, Tong Zhang

TL;DR
ExtremeBERT is a toolkit that significantly accelerates BERT pretraining, making it more accessible and customizable for researchers and industry with limited resources.
Contribution
It introduces a user-friendly toolkit that reduces pretraining time for BERT by over 6 to 9 times, enabling efficient customization on various datasets.
Findings
Over 6x faster pretraining for BERT Base
Over 9x faster pretraining for BERT Large
Achieves comparable or better GLUE scores
Abstract
In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our toolkit is over times less for BERT Base and times less for BERT Large when compared with the original BERT paper. The documentation and code are released at https://github.com/extreme-bert/extreme-bert under the Apache-2.0 license.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Layer Normalization
