Development of Pre-Trained Transformer-based Models for the Nepali Language
Prajwal Thapa, Jinu Nyachhyon, Mridul Sharma, Bal Krishna Bal

TL;DR
This paper introduces Nepali-specific transformer models (BERT, RoBERTa, GPT-2) trained on a large corpus, significantly advancing NLP capabilities for Nepali through improved understanding and generation tasks.
Contribution
It is the first to pre-train multiple transformer models on a large Nepali corpus and explore instruction tuning, filling a resource gap for Nepali NLP research.
Findings
Models outperform previous best on Nep-gLUE benchmark by 2 points.
Achieved state-of-the-art results in Nepali text generation.
Collected the largest Nepali text corpus to date.
Abstract
Transformer-based pre-trained language models have dominated the field of Natural Language Processing (NLP) for quite some time now. However, the Nepali language, spoken by approximately 32 million people worldwide, remains significantly underrepresented in this domain. This underrepresentation is primarily attributed to the scarcity of monolingual data corpora and limited available resources for the Nepali language. While existing efforts have predominantly concentrated on basic encoder-based models, there is a notable gap in the exploration of decoder-based architectures. To address this gap, we have collected 27.5 GB of Nepali text data, approximately 2.4x larger than any previously available Nepali language corpus. Leveraging this data, we pre-trained three different models i.e., BERT, RoBERTa, and GPT-2, exclusively for the Nepali Language. Furthermore, we performed instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Discriminative Fine-Tuning · Cosine Annealing · Linear Layer · Byte Pair Encoding · Adam · RoBERTa · Residual Connection · Weight Decay · Softmax
