Building a Sentiment Corpus of Tweets in Brazilian Portuguese
Henrico Bertini Brum, Maria das Gra\c{c}as Volpe Nunes

TL;DR
This paper presents TweetSentBR, a manually annotated sentiment corpus of 15,000 Brazilian Portuguese tweets from TV shows, and evaluates baseline machine learning models for sentiment classification.
Contribution
Introduces TweetSentBR, a new sentiment dataset for Brazilian Portuguese, with reliable annotations and baseline classification results.
Findings
Achieved 80.99% F-Measure in binary sentiment classification.
Reached 59.85% F-Measure in three-class sentiment classification.
Demonstrated the dataset's utility for sentiment analysis research.
Abstract
The large amount of data available in social media, forums and websites motivates researches in several areas of Natural Language Processing, such as sentiment analysis. The popularity of the area due to its subjective and semantic characteristics motivates research on novel methods and approaches for classification. Hence, there is a high demand for datasets on different domains and different languages. This paper introduces TweetSentBR, a sentiment corpora for Brazilian Portuguese manually annotated with 15.000 sentences on TV show domain. The sentences were labeled in three classes (positive, neutral and negative) by seven annotators, following literature guidelines for ensuring reliability on the annotation. We also ran baseline experiments on polarity classification using three machine learning methods, reaching 80.99% on F-Measure and 82.06% on accuracy in binary classification,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques · Topic Modeling
