TL;DR
KinyaBERT introduces a morphology-aware two-tier BERT architecture tailored for low-resource, morphologically rich languages like Kinyarwanda, improving performance on NLP tasks by explicitly modeling morphological structures.
Contribution
The paper proposes a novel two-tier BERT model that incorporates morphological analysis to better handle morphologically rich languages, demonstrating improved NLP task performance.
Findings
KinyaBERT outperforms baselines by 2% in NER F1 score.
Achieves 4.3% higher average score on a machine-translated GLUE benchmark.
Exhibits better convergence and robustness across multiple tasks.
Abstract
Pre-trained language models such as BERT have been successful at tackling many natural language processing tasks. However, the unsupervised sub-word tokenization methods commonly used in these models (e.g., byte-pair encoding - BPE) are sub-optimal at handling morphologically rich languages. Even given a morphological analyzer, naive sequencing of morphemes into a standard BERT architecture is inefficient at capturing morphological compositionality and expressing word-relative syntactic regularities. We address these challenges by proposing a simple yet effective two-tier BERT architecture that leverages a morphological analyzer and explicitly represents morphological compositionality. Despite the success of BERT, most of its evaluations have been conducted on high-resource languages, obscuring its applicability on low-resource languages. We evaluate our proposed method on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Attention Dropout · Weight Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Layer Normalization
