Arabic Text Mining
Sumaia Mohammed AL-Ghuribi, Shahrul Azman Mohd Noah

TL;DR
This paper introduces a new Arabic text classification system using light stemming and Naive Bayesian classifier, demonstrating effective categorization of politics and sports texts despite language complexity.
Contribution
It presents a novel Arabic text classification approach combining light stemming with Naive Bayesian classifier, addressing the scarcity of Arabic-specific text mining methods.
Findings
System correctly classified new texts from politics and sports.
Effective classification achieved with the proposed approach.
Addresses challenges of Arabic language in text mining.
Abstract
The rapid growth of the internet has increased the number of online texts. This led to the rapid growth of the number of online texts in the Arabic language. The enormous amount of text must be organized into classes to make the analysis process and text retrieval easier. Text classification is, therefore, a key component of text mining. There are numerous systems and approaches for categorizing literature in English, European (French, German, Spanish), and Asian (Chinese, Japanese). In contrast, there are relatively few studies on categorizing Arabic literature due to the difficulty of the Arabic language. In this work, a brief explanation of key ideas relevant to Arabic text mining are introduced then a new classification system for the Arabic language is presented using light stemming and Classifier Na\"ive Bayesian (CNB). Texts from two classes: politics and sports, are included in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques
