Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis
Samhaa R. El-Beltagy, Talaat Khalil, Amal Halaby, and Muhammad Hammad

TL;DR
This paper introduces a novel sentiment analysis model for Arabic social media text that combines lexical features, emoticons, and input text characteristics, achieving state-of-the-art accuracy across multiple dialects and datasets.
Contribution
The paper presents a new feature set and machine learning model that significantly improves Arabic sentiment analysis performance on diverse social media datasets.
Findings
Achieved higher accuracy on six out of seven datasets
Outperformed all existing publicly available Arabic sentiment analysis systems
Validated across multiple Arabic dialects and MSA datasets
Abstract
The importance of building sentiment analysis tools for Arabic social media has been recognized during the past couple of years, especially with the rapid increase in the number of Arabic social media users. One of the main difficulties in tackling this problem is that text within social media is mostly colloquial, with many dialects being used within social media platforms. In this paper, we present a set of features that were integrated with a machine learning based sentiment analysis model and applied on Egyptian, Saudi, Levantine, and MSA Arabic social media datasets. Many of the proposed features were derived through the use of an Arabic Sentiment Lexicon. The model also presents emoticon based features, as well as input text related features such as the number of segments within the text, the length of the text, whether the text ends with a question mark or not, etc. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
