Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid   Approach

Nora Al-Twairesh; Hend Al-Khalifa; AbdulMalik Alsalman; Yousef; Al-Ohali

arXiv:1805.08533·cs.CL·May 23, 2018·27 cites

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Nora Al-Twairesh, Hend Al-Khalifa, AbdulMalik Alsalman, Yousef, Al-Ohali

PDF

Open Access

TL;DR

This paper presents a hybrid sentiment analysis approach for Saudi dialect Arabic tweets, combining feature engineering and corpus- and lexicon-based methods, achieving improved F1-scores across multiple classification tasks.

Contribution

It introduces a novel hybrid method tailored for Arabic dialect tweets, integrating feature selection with combined corpus and lexicon techniques.

Findings

01

Best F1-score for two-way classification: 69.9

02

Achieved F1-scores of 61.63 and 55.07 for three-way and four-way classifications

03

Effective feature engineering improved sentiment classification accuracy.

Abstract

Sentiment Analysis in Arabic is a challenging task due to the rich morphology of the language. Moreover, the task is further complicated when applied to Twitter data that is known to be highly informal and noisy. In this paper, we develop a hybrid method for sentiment analysis for Arabic tweets for a specific Arabic dialect which is the Saudi Dialect. Several features were engineered and evaluated using a feature backward selection method. Then a hybrid method that combines a corpus-based and lexicon-based method was developed for several classification models (two-way, three-way, four-way). The best F1-score for each of these models was (69.9,61.63,55.07) respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques · Topic Modeling