JABER and SABER: Junior and Senior Arabic BERt
Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi, Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing, Huai, Xin Jiang, Qun Liu, Philippe Langlais

TL;DR
This paper introduces JABER and SABER, two pre-trained Arabic language models that outperform existing models on multiple Arabic NLP benchmarks, demonstrating the importance of dedicated training for language-specific models.
Contribution
The paper presents the development and evaluation of JABER and SABER, the first dedicated Arabic BERT models that achieve state-of-the-art results on key benchmarks.
Findings
JABER and SABER outperform previous Arabic models on ALUE and NER benchmarks.
Dedicated Arabic BERT models significantly improve NLP task performance.
Experimental results confirm the effectiveness of language-specific pre-training.
Abstract
Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception. However, we found that previously released Arabic BERT models were significantly under-trained. In this technical report, we present JABER and SABER, Junior and Senior Arabic BERt respectively, our pre-trained language model prototypes dedicated for Arabic. We conduct an empirical study to systematically evaluate the performance of models across a diverse set of existing Arabic NLU tasks. Experimental results show that JABER and SABER achieve state-of-the-art performances on ALUE, a new benchmark for Arabic Language Understanding Evaluation, as well as on a well-established NER benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Softmax · Weight Decay · WordPiece · Adam · Residual Connection
