Empirical evaluation of shallow and deep learning classifiers for Arabic sentiment analysis
Ali Bou Nassif, Abdollah Masoud Darya, Ashraf Elnagar

TL;DR
This study compares shallow and deep learning models, including transformers and araBERT, for Arabic sentiment analysis on large datasets, showing deep learning generally outperforms shallow methods, especially with larger datasets.
Contribution
It provides a comprehensive empirical comparison of various deep and shallow models, highlighting the impact of dataset size and the effectiveness of transformer-based models for Arabic sentiment analysis.
Findings
Deep learning outperforms shallow learning on large datasets
Transformer with araBERT achieves the best accuracy
Random Forest is the top shallow classifier
Abstract
This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks (CNN), long short-term memory (LSTM), gated recurrent units (GRU), their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
