NER- RoBERTa: Fine-Tuning RoBERTa for Named Entity Recognition (NER) within low-resource languages
Abdulhady Abas Abdullah, Srwa Hasan Abdulla, Dalia Mohammad Toufiq,, Halgurd S. Maghdid, Tarik A. Rashid, Pakshan F. Farho, Shadan Sh. Sabr, Akar, H. Taher, Darya S. Hamad, Hadi Veisi, and Aras T. Asaad

TL;DR
This paper presents a method for fine-tuning RoBERTa for Kurdish Named Entity Recognition, creating a Kurdish corpus, and demonstrating significant performance improvements over traditional models.
Contribution
It introduces a novel fine-tuning approach for RoBERTa on Kurdish NER, including corpus creation, model modifications, and experimental validation.
Findings
12.8% F1-score improvement over traditional models
SentencePiece tokenization enhances NER performance
Establishment of a new benchmark for Kurdish NLP
Abstract
Nowadays, Natural Language Processing (NLP) is an important tool for most people's daily life routines, ranging from understanding speech, translation, named entity recognition (NER), and text categorization, to generative text models such as ChatGPT. Due to the existence of big data and consequently large corpora for widely used languages like English, Spanish, Turkish, Persian, and many more, these applications have been developed accurately. However, the Kurdish language still requires more corpora and large datasets to be included in NLP applications. This is because Kurdish has a rich linguistic structure, varied dialects, and a limited dataset, which poses unique challenges for Kurdish NLP (KNLP) application development. While several studies have been conducted in KNLP for various applications, Kurdish NER (KNER) remains a challenge for many KNLP tasks, including text analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
