WikiTermBase: An AI-Augmented Term Base to Standardize Arabic Translation on Wikipedia
Michel Bakni (ESTIA), Abbad Diraneyya, Wael Tellat

TL;DR
WikiTermBase is an open source tool that leverages NLP and LLMs to create a large, standardized Arabic term database, improving translation consistency on Wikipedia.
Contribution
It introduces a systematic approach to build a comprehensive Arabic lexicographical database and demonstrates its application in standardizing terms on Wikipedia.
Findings
Created a database of over 900K terms from multiple sources.
Successfully applied the tool to standardize Arabic translations on Wikipedia.
Enhanced consistency in Arabic technical terminology.
Abstract
Term bases are recognized as one of the most effective components of translation software in time saving and consistency. In spite of the many recent advances in natural language processing (NLP) and large language models (LLMs), major translation platforms have yet to take advantage of these tools to improve their term bases and support scalable content for underrepresented languages, which often struggle with localizing technical terminology. Language academies in the Arab World, for example, have struggled since the 1940s to unify the way new scientific terms enter the Arabic language at scale. This abstract introduces an open source tool, WikiTermBase, with a systematic approach for building a lexicographical database with over 900K terms, which were collected and mapped from a multitude of sources on a semantic and morphological basis. The tool was successfully implemented on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
