SentiPers: A Sentiment Analysis Corpus for Persian
Pedram Hosseini, Ali Ahmadian Ramaki, Hassan Maleki, Mansoureh Anvari,, Seyed Abolghasem Mirroshandel

TL;DR
This paper introduces SentiPers, a comprehensive manually annotated sentiment corpus for Persian, covering formal and informal texts with multi-level annotations, aiding sentiment analysis research for low-resource languages.
Contribution
It presents the creation of a unique Persian sentiment corpus with multi-level annotations, including a large dataset of over 26,000 sentences with sentiment scores.
Findings
Rich annotation levels including document, sentence, and entity/aspect.
The corpus contains over 26,000 sentences from digital product opinions.
Inter-annotator agreement and annotation challenges are analyzed.
Abstract
Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets play an important role in designing and developing appropriate opinion mining platforms using supervised, semi-supervised or unsupervised methods. In this paper, we outline the entire process of developing a manually annotated sentiment corpus, SentiPers, which covers formal and informal written contemporary Persian. To the best of our knowledge, SentiPers is a unique sentiment corpus with such a rich annotation in three different levels including document-level, sentence-level, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques · Topic Modeling
