An Enhanced Corpus for Arabic Newspapers Comments
Hichem Rahab, Abdelhafid Zitouni, Mahieddine Djoudi (TECHN\'E - EA, 6316)

TL;DR
This paper presents an improved method for creating and annotating a corpus of Algerian Arabic newspaper comments, evaluating classifiers with and without stemming, and highlighting challenges with dialectal and mixed-language comments.
Contribution
The paper introduces an enhanced corpus creation approach using the MATTER annotation method and evaluates classification techniques on Algerian Arabic comments.
Findings
Stemming does not significantly improve classification accuracy.
Support vector machines perform well in classifying comments.
Challenges remain with dialectal and non-Arabic comments.
Abstract
In this paper, we propose our enhanced approach to create a dedicated corpus for Algerian Arabic newspapers comments. The developed approach has to enhance an existing approach by the enrichment of the available corpus and the inclusion of the annotation step by following the Model Annotate Train Test Evaluate Revise (MATTER) approach. A corpus is created by collecting comments from web sites of three well know Algerian newspapers. Three classifiers, support vector machines, na{\"i}ve Bayes, and k-nearest neighbors, were used for classification of comments into positive and negative classes. To identify the influence of the stemming in the obtained results, the classification was tested with and without stemming. Obtained results show that stemming does not enhance considerably the classification due to the nature of Algerian comments tied to Algerian Arabic Dialect. The promising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
