Evaluating Language Model Finetuning Techniques for Low-resource Languages
Jan Christian Blaise Cruz, Charibeth Cheng

TL;DR
This paper introduces a new Filipino language dataset and demonstrates that finetuning models like BERT and ULMFiT can effectively train classifiers in low-resource language settings, maintaining performance with fewer training examples.
Contribution
The paper provides a new benchmark dataset for Filipino and shows that existing language model finetuning techniques are effective for low-resource languages.
Findings
Finetuning techniques maintain low validation error with fewer training examples.
Introduction of WikiText-TL-39 dataset for Filipino language modeling.
Robust classifiers achieved in low-resource settings with minimal performance loss.
Abstract
Unlike mainstream languages (such as English and French), low-resource languages often suffer from a lack of expert-annotated corpora and benchmark resources that make it hard to apply state-of-the-art techniques directly. In this paper, we alleviate this scarcity problem for the low-resourced Filipino language in two ways. First, we introduce a new benchmark language modeling dataset in Filipino which we call WikiText-TL-39. Second, we show that language model finetuning techniques such as BERT and ULMFiT can be used to consistently train robust classifiers in low-resource settings, experiencing at most a 0.0782 increase in validation error when the number of training examples is decreased from 10K to 1K while finetuning using a privately-held sentiment dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam
