Rule-and Dictionary-based Solution for Variations in Written Arabic   Names in Social Networks, Big Data, Accounting Systems and Large Databases

Ahmad B.A. Hassanat; Ghada Awad Altarawneh

arXiv:1502.05441·cs.DB·February 20, 2015

Rule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases

Ahmad B.A. Hassanat, Ghada Awad Altarawneh

PDF

TL;DR

This paper presents a rule-based and dictionary-driven approach to handle variations in written Arabic names across social networks and large databases, improving search and standardization despite some manual error correction.

Contribution

It introduces an automatic method to generate a comprehensive Arabic name dictionary with alternative forms, addressing the limitations of exact and approximate matching.

Findings

01

Generated a dictionary from 9.9 million names with 7% errors

02

Manual editing reduced errors but did not eliminate all inaccuracies

03

The approach aids in name standardization and search in large databases

Abstract

This paper investigates the problem that some Arabic names can be written in multiple ways. When someone searches for only one form of a name, neither exact nor approximate matching is appropriate for returning the multiple variants of the name. Exact matching requires the user to enter all forms of the name for the search, and approximate matching yields names not among the variations of the one being sought. In this paper, we attempt to solve the problem with a dictionary of all Arabic names mapped to their different (alternative) writing forms. We generated alternatives based on rules we derived from reviewing the first names of 9.9 million citizens and former citizens of Jordan. This dictionary can be used for both standardizing the written form when inserting a new name into a database and for searching for the name and all its alternative written forms. Creating the dictionary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.