Collaborative construction of lexicographic and parallel datasets for   African languages: first assessment

Elvis Mboning Tchiaze

arXiv:2103.16712·cs.CL·April 1, 2021·AfricaNLP

Collaborative construction of lexicographic and parallel datasets for African languages: first assessment

Elvis Mboning Tchiaze

PDF

Open Access

TL;DR

This paper reports on a two-year collaborative effort to build open-source lexicographic datasets for African languages, addressing resource scarcity in NLP and AI applications.

Contribution

It introduces a novel collaborative platform for creating and sharing lexicographic data in African languages, a first in this field.

Findings

01

Two years of collaborative lexicographic data collection

02

Development of open-source datasets for African NLP

03

Initial assessment of resource quality and coverage

Abstract

Faced with a considerable lack of resources in African languages to carry out work in Natural Language Processing (NLP), Natural Language Understanding (NLU) and artificial intelligence, the research teams of NTeALan association has set itself the objective of building open-source platforms for the collaborative construction of lexicographic data in African languages. In this article, we present our first reports after 2 years of collaborative construction of lexicographic resources useful for African NLP tools.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Topic Modeling