Collaborative construction of lexicographic and parallel datasets for African languages: first assessment
Elvis Mboning Tchiaze

TL;DR
This paper reports on a two-year collaborative effort to build open-source lexicographic datasets for African languages, addressing resource scarcity in NLP and AI applications.
Contribution
It introduces a novel collaborative platform for creating and sharing lexicographic data in African languages, a first in this field.
Findings
Two years of collaborative lexicographic data collection
Development of open-source datasets for African NLP
Initial assessment of resource quality and coverage
Abstract
Faced with a considerable lack of resources in African languages to carry out work in Natural Language Processing (NLP), Natural Language Understanding (NLU) and artificial intelligence, the research teams of NTeALan association has set itself the objective of building open-source platforms for the collaborative construction of lexicographic data in African languages. In this article, we present our first reports after 2 years of collaborative construction of lexicographic resources useful for African NLP tools.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Topic Modeling
