MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
Cheikh M. Bamba Dione, David Adelani, Peter Nabende, Jesujoba Alabi,, Thapelo Sindane, Happy Buzaaba, Shamsuddeen Hassan Muhammad, Chris Chinenye, Emezue, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye,, Jonathan Mukiibi, Blessing Sibanda

TL;DR
This paper introduces MasakhaPOS, a comprehensive POS dataset for 20 African languages, and explores cross-lingual transfer methods to improve POS tagging performance across typologically diverse languages.
Contribution
It provides the largest POS dataset for African languages and evaluates cross-lingual transfer techniques, highlighting the importance of language similarity and fine-tuning methods.
Findings
Transfer from similar languages improves POS tagging accuracy.
Multilingual models outperform single-language models.
Cross-lingual fine-tuning enhances performance significantly.
Abstract
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
