Rule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases
Ahmad B.A. Hassanat, Ghada Awad Altarawneh

TL;DR
This paper presents a rule-based and dictionary-driven approach to handle variations in written Arabic names across social networks and large databases, improving search and standardization despite some manual error correction.
Contribution
It introduces an automatic method to generate a comprehensive Arabic name dictionary with alternative forms, addressing the limitations of exact and approximate matching.
Findings
Generated a dictionary from 9.9 million names with 7% errors
Manual editing reduced errors but did not eliminate all inaccuracies
The approach aids in name standardization and search in large databases
Abstract
This paper investigates the problem that some Arabic names can be written in multiple ways. When someone searches for only one form of a name, neither exact nor approximate matching is appropriate for returning the multiple variants of the name. Exact matching requires the user to enter all forms of the name for the search, and approximate matching yields names not among the variations of the one being sought. In this paper, we attempt to solve the problem with a dictionary of all Arabic names mapped to their different (alternative) writing forms. We generated alternatives based on rules we derived from reviewing the first names of 9.9 million citizens and former citizens of Jordan. This dictionary can be used for both standardizing the written form when inserting a new name into a database and for searching for the name and all its alternative written forms. Creating the dictionary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
