TL;DR
This paper introduces PELESent, a cross-domain sentiment classification method for Portuguese that leverages distant supervision with emojis and emoticons, trained on nearly one million tweets, achieving competitive results across multiple domains.
Contribution
It extends distant supervision techniques to include emojis for Portuguese sentiment analysis, demonstrating domain-independent effectiveness and improved sentiment capture.
Findings
Achieved competitive results on five diverse corpora
Demonstrated domain independence of the approach
Showed emojis and emoticons effectively capture sentiment
Abstract
The enormous amount of texts published daily by Internet users has fostered the development of methods to analyze this content in several natural language processing areas, such as sentiment analysis. The main goal of this task is to classify the polarity of a message. Even though many approaches have been proposed for sentiment analysis, some of the most successful ones rely on the availability of large annotated corpus, which is an expensive and time-consuming process. In recent years, distant supervision has been used to obtain larger datasets. So, inspired by these techniques, in this paper we extend such approaches to incorporate popular graphic symbols used in electronic messages, the emojis, in order to create a large sentiment corpus for Portuguese. Trained on almost one million tweets, several models were tested in both same domain and cross-domain corpora. Our methods obtained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
