MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning
Luke Gessler, Amir Zeldes

TL;DR
MicroBERT demonstrates that significantly reducing model size and incorporating linguistically rich supervised tasks during training can greatly improve low-resource monolingual language models, outperforming larger multilingual models in key NLP tasks.
Contribution
This work introduces MicroBERT, a low-resource monolingual BERT variant that combines parameter reduction with multitask learning to enhance NLP performance.
Findings
MicroBERT achieves up to 18% improvement in parser LAS.
MicroBERT improves NER F1 scores by up to 11%.
Model size is less than 1% of mBERT with comparable or better performance.
Abstract
Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training monolingual TLMs in a low-resource setting: greatly reducing TLM size, and complementing the masked language modeling objective with two linguistically rich supervised tasks (part-of-speech tagging and dependency parsing). Results from 7 diverse languages indicate that our model, MicroBERT, is able to produce marked improvements in downstream task evaluations relative to a typical monolingual TLM pretraining approach. Specifically, we find that monolingual MicroBERT models achieve gains of up to 18% for parser LAS and 11% for NER F1 compared to a multilingual baseline, mBERT, while having less than 1% of its parameter count. We conclude reducing TLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsmBERT
