Arabic Keyphrase Extraction using Linguistic knowledge and Machine Learning Techniques
Tarek El-shishtawy, Abdulwahab Al-sammak

TL;DR
This paper presents a supervised machine learning approach for extracting Arabic keyphrases that incorporates linguistic knowledge and syntactic rules, achieving significantly improved precision and recall over existing systems.
Contribution
It introduces a novel Arabic keyphrase extraction method combining linguistic features, syntactic rules, and linear discriminant analysis, outperforming existing systems especially on lengthy, non-scientific texts.
Findings
Higher precision and recall compared to existing systems
Effective extraction of keyphrases from lengthy Arabic texts
Utilization of linguistic features improves extraction accuracy
Abstract
In this paper, a supervised learning technique for extracting keyphrases of Arabic documents is presented. The extractor is supplied with linguistic knowledge to enhance its efficiency instead of relying only on statistical information such as term frequency and distance. During analysis, an annotated Arabic corpus is used to extract the required lexical features of the document words. The knowledge also includes syntactic rules based on part of speech tags and allowed word sequences to extract the candidate keyphrases. In this work, the abstract form of Arabic words is used instead of its stem form to represent the candidate terms. The Abstract form hides most of the inflections found in Arabic words. The paper introduces new features of keyphrases based on linguistic knowledge, to capture titles and subtitles of a document. A simple ANOVA test is used to evaluate the validity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
