The Importance of Context in Very Low Resource Language Modeling
Lukas Edman, Antonio Toral, Gertjan van Noord

TL;DR
This paper highlights the significance of local context in very low resource language modeling, demonstrating that statistical n-gram models outperform neural models in extremely data-scarce scenarios and proposing methods to enhance neural models.
Contribution
It introduces three techniques to improve neural language models in low-resource settings, notably limiting self-attention, leading to better downstream task performance.
Findings
Statistical n-gram models outperform neural models with less than 100k sentences.
Limiting self-attention improves neural model performance by up to 5%.
Methods tested on English, Hindi, and Turkish show consistent gains.
Abstract
This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available. We find that, in very low resource scenarios, statistical n-gram language models outperform state-of-the-art neural models. Our experiments show that this is mainly due to the focus of the former on a local context. As such, we introduce three methods to improve a neural model's performance in the low-resource setting, finding that limiting the model's self-attention is the most effective one, improving on downstream tasks such as NLI and POS tagging by up to 5% for the languages we test on: English, Hindi, and Turkish.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
