Computerization of African languages-French dictionaries
Chantal Enguehard (LINA), Mathieu Mangeot (LIG)

TL;DR
This paper details the digitization and online deployment of five African language-French dictionaries using XML and LMF, enhancing NLP resources for under-resourced languages.
Contribution
It introduces a systematic methodology for converting Word-based dictionaries into XML-LMF format and makes them accessible online, supporting NLP development for African languages.
Findings
Dictionaries are successfully converted into XML-LMF format.
Dictionaries are now accessible online via Jibiki platform.
The process supports under-resourced language NLP tools.
Abstract
This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLexicography and Language Studies
