Mot\`aMot project: conversion of a French-Khmer published dictionary for building a multilingual lexical system
Mathieu Mangeot (LIG)

TL;DR
The MotàMot project develops a multilingual lexical system for Khmer by converting a French-Khmer dictionary into a digital, accessible resource, facilitating language processing and technological development in Cambodia.
Contribution
It introduces a novel method for digitizing and structuring Khmer lexical data using a pivot macrostructure and conversion techniques from a bilingual dictionary.
Findings
Created an online accessible lexical database for Khmer
Successfully converted Khmer headwords from IPA to Khmer script using OpenFST
Integrated French and Khmer lexical data into a multilingual system
Abstract
Economic issues related to the information processing techniques are very important. The development of such technologies is a major asset for developing countries like Cambodia and Laos, and emerging ones like Vietnam, Malaysia and Thailand. The MotAMot project aims to computerize an under-resourced language: Khmer, spoken mainly in Cambodia. The main goal of the project is the development of a multilingual lexical system targeted for Khmer. The macrostructure is a pivot one with each word sense of each language linked to a pivot axi. The microstructure comes from a simplification of the explanatory and combinatory dictionary. The lexical system has been initialized with data coming mainly from the conversion of the French-Khmer bilingual dictionary of Denis Richer from Word to XML format. The French part was completed with pronunciation and parts-of-speech coming from the FeM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistic Studies and Language Acquisition
