Arabic Keyphrase Extraction using Linguistic knowledge and Machine   Learning Techniques

Tarek El-shishtawy; Abdulwahab Al-sammak

arXiv:1203.4605·cs.CL·March 22, 2012·32 cites

Arabic Keyphrase Extraction using Linguistic knowledge and Machine Learning Techniques

Tarek El-shishtawy, Abdulwahab Al-sammak

PDF

Open Access

TL;DR

This paper presents a supervised machine learning approach for extracting Arabic keyphrases that incorporates linguistic knowledge and syntactic rules, achieving significantly improved precision and recall over existing systems.

Contribution

It introduces a novel Arabic keyphrase extraction method combining linguistic features, syntactic rules, and linear discriminant analysis, outperforming existing systems especially on lengthy, non-scientific texts.

Findings

01

Higher precision and recall compared to existing systems

02

Effective extraction of keyphrases from lengthy Arabic texts

03

Utilization of linguistic features improves extraction accuracy

Abstract

In this paper, a supervised learning technique for extracting keyphrases of Arabic documents is presented. The extractor is supplied with linguistic knowledge to enhance its efficiency instead of relying only on statistical information such as term frequency and distance. During analysis, an annotated Arabic corpus is used to extract the required lexical features of the document words. The knowledge also includes syntactic rules based on part of speech tags and allowed word sequences to extract the candidate keyphrases. In this work, the abstract form of Arabic words is used instead of its stem form to represent the candidate terms. The Abstract form hides most of the inflections found in Arabic words. The paper introduces new features of keyphrases based on linguistic knowledge, to capture titles and subtitles of a document. A simple ANOVA test is used to evaluate the validity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques