Comparing Open Arabic Named Entity Recognition Tools

Abdullah Aldumaykhi; Saad Otai; Abdulkareem Alsudais

arXiv:2205.05857·cs.CL·May 13, 2022·1 cites

Comparing Open Arabic Named Entity Recognition Tools

Abdullah Aldumaykhi, Saad Otai, Abdulkareem Alsudais

PDF

Open Access

TL;DR

This paper compares three open Arabic NER tools, introduces merging and voting methods to improve performance, and evaluates their effectiveness on a new COVID-19 related corpus, highlighting tradeoffs between precision and recall.

Contribution

It provides a comparative analysis of Arabic NER tools and proposes combined methods to enhance entity recognition performance.

Findings

01

Merging results yields highest overall F1 scores.

02

Hatmi and Stanza show similar performance, with Hatmi having the highest F1.

03

Merging favors recall, voting favors precision.

Abstract

The main objective of this paper is to compare and evaluate the performances of three open Arabic NER tools: CAMeL, Hatmi, and Stanza. We collected a corpus consisting of 30 articles written in MSA and manually annotated all the entities of the person, organization, and location types at the article (document) level. Our results suggest a similarity between Stanza and Hatmi with the latter receiving the highest F1 score for the three entity types. However, CAMeL achieved the highest precision values for names of people and organizations. Following this, we implemented a "merge" method that combined the results from the three tools and a "vote" method that tagged named entities only when two of the three identified them as entities. Our results showed that merging achieved the highest overall F1 scores. Moreover, merging had the highest recall values while voting had the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies