A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms
Sana Ghanem, Mustafa Jarrar, Radi Jarrar, Ibrahim Bounhas

TL;DR
This paper introduces a new algorithm and benchmark dataset for enriching Arabic synsets with additional synonyms based on a fuzzy measure of synonymy strength, aiding linguistic analysis and computational applications.
Contribution
It presents a novel algorithm for extracting and scoring synonyms from lexicons, along with a benchmark dataset annotated by linguists for evaluation.
Findings
The algorithm's fuzzy values closely match linguist annotations.
The dataset reveals levels of agreement among linguists on synonymy.
The approach effectively automates synonym enrichment with high accuracy.
Abstract
This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons. We present twofold contributions: an algorithm and a benchmark dataset. The dataset consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is annotated with a fuzzy value by four linguists. The dataset is important for (i) understanding how much linguists (dis/)agree on synonymy, in addition to (ii) using the dataset as a baseline to evaluate our algorithm. Our proposed algorithm extracts synonyms from existing lexicons and computes a fuzzy value for each candidate. Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Topic Modeling
