Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs
Mojtaba Mazoochi (ICT Research Institute, Tehran, Iran), Leila Rabiei, (Iran Telecommunication Research Center (ITRC), Tehran, Iran), Farzaneh, Rahmani (Computer Department, Mehralborz University, Tehran, Iran), Zeinab, Rajabi (Computer Department, Hazrat-e Masoumeh University

TL;DR
This paper introduces a new colloquial Persian sentiment dataset from social microblogs and proposes a CNN-based architecture that significantly improves sentiment classification accuracy for informal Persian texts.
Contribution
It constructs the first large-scale colloquial Persian sentiment dataset and develops a CNN-based model tailored for social media text analysis in low-resource languages.
Findings
Constructed a 60,000-item Persian social microblog dataset.
Achieved 72% accuracy with the proposed CNN model.
Demonstrated improved sentiment classification performance.
Abstract
Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Web Data Mining and Analysis · Topic Modeling
MethodsTanh Activation · Sigmoid Activation · Bidirectional LSTM · Long Short-Term Memory · Bidirectional GRU
