Exa-PSD: a new Persian sentiment analysis dataset on Twitter
Seyed Himan Ghaderi, Saeed Sarbazi Azad, Mohammad Mehdi Jaziriyan, Ahmad Akbari

TL;DR
This paper introduces Exa-PSD, a new Persian Twitter sentiment dataset with 12,000 annotated tweets, and evaluates it using pre-trained language models, achieving a Macro F-score of nearly 80%.
Contribution
The paper provides the first large-scale Persian Twitter sentiment dataset with annotations and demonstrates its usefulness with baseline model evaluations.
Findings
Achieved 79.87% Macro F-score using Pars BERT and RoBERTa.
Dataset contains 12,000 tweets annotated in positive, neutral, and negative classes.
Dataset characteristics and statistics are detailed.
Abstract
Today, Social networks such as Twitter are the most widely used platforms for communication of people. Analyzing this data has useful information to recognize the opinion of people in tweets. Sentiment analysis plays a vital role in NLP, which identifies the opinion of the individuals about a specific topic. Natural language processing in Persian has many challenges despite the adventure of strong language models. The datasets available in Persian are generally in special topics such as products, foods, hotels, etc while users may use ironies, colloquial phrases in social media To overcome these challenges, there is a necessity for having a dataset of Persian sentiment analysis on Twitter. In this paper, we introduce the Exa sentiment analysis Persian dataset, which is collected from Persian tweets. This dataset contains 12,000 tweets, annotated by 5 native Persian taggers. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Spam and Phishing Detection · Topic Modeling
