MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza,, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba,, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime,, Shamsuddeen Muhammad, Chris Chinenye Emezue

TL;DR
This paper introduces MasakhaNER, a high-quality NER dataset for ten African languages, addressing under-representation in NLP and enabling future research with comprehensive data, analysis, and evaluation of methods.
Contribution
It provides the first large, publicly available NER dataset for African languages, including analysis and evaluation of state-of-the-art methods in supervised and transfer learning.
Findings
State-of-the-art models perform variably across languages.
Transfer learning improves NER performance in low-resource settings.
The dataset reveals unique linguistic challenges of African languages.
Abstract
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. We release the data, code, and models in order to inspire future research on African NLP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Davlan/bert-base-multilingual-cased-masakhanermodel· 55 dl· ♡ 455 dl♡ 4
- 🤗Davlan/distilbert-base-multilingual-cased-masakhanermodel· 52 dl· ♡ 252 dl♡ 2
- 🤗Davlan/xlm-roberta-base-masakhanermodel· 483 dl· ♡ 1483 dl♡ 1
- 🤗Davlan/xlm-roberta-large-masakhanermodel· 143 dl· ♡ 2143 dl♡ 2
- 🤗mbeukman/xlm-roberta-base-finetuned-amharic-finetuned-ner-amharicmodel· 34 dl· ♡ 134 dl♡ 1
- 🤗mbeukman/xlm-roberta-base-finetuned-amharic-finetuned-ner-swahilimodel· 4 dl· ♡ 14 dl♡ 1
- 🤗mbeukman/xlm-roberta-base-finetuned-hausa-finetuned-ner-hausamodel· 1 dl1 dl
- 🤗mbeukman/xlm-roberta-base-finetuned-hausa-finetuned-ner-swahilimodel· 4 dl4 dl
- 🤗mbeukman/xlm-roberta-base-finetuned-igbo-finetuned-ner-igbomodel· 1 dl1 dl
- 🤗mbeukman/xlm-roberta-base-finetuned-igbo-finetuned-ner-swahilimodel· 5 dl· ♡ 15 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
