Egyptian Dialect Stopword List Generation from Social Network Data
Walaa Medhat, Ahmed H. Yousef, Hoda Korashy

TL;DR
This paper develops an Egyptian Dialect stopword list from social media data and demonstrates that removing these stopwords improves sentiment analysis performance compared to using only Modern Standard Arabic stopwords.
Contribution
It introduces a novel methodology for generating Egyptian Dialect stopword lists from social network data and evaluates their impact on sentiment analysis accuracy.
Findings
Removing ED stopwords improves sentiment analysis accuracy.
Egyptian Dialect stopword list outperforms MSA stopword list.
Combining ED and MSA stopwords yields the best results.
Abstract
This paper proposes a methodology for generating a stopword list from online social network (OSN) corpora in Egyptian Dialect(ED). The aim of the paper is to investigate the effect of removingED stopwords on the Sentiment Analysis (SA) task. The stopwords lists generated before were on Modern Standard Arabic (MSA) which is not the common language used in OSN. We have generated a stopword list of Egyptian dialect to be used with the OSN corpora. We compare the efficiency of text classification when using the generated list along with previously generated lists of MSA and combining the Egyptian dialect list with the MSA list. The text classification was performed using Na\"ive Bayes and Decision Tree classifiers and two feature selection approaches, unigram and bigram. The experiments show that removing ED stopwords give better performance than using lists of MSA stopwords only.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
