Building a Sentiment Corpus of Tweets in Brazilian Portuguese

Henrico Bertini Brum; Maria das Gra\c{c}as Volpe Nunes

arXiv:1712.08917·cs.CL·December 27, 2017·30 cites

Building a Sentiment Corpus of Tweets in Brazilian Portuguese

Henrico Bertini Brum, Maria das Gra\c{c}as Volpe Nunes

PDF

Open Access 1 Repo

TL;DR

This paper presents TweetSentBR, a manually annotated sentiment corpus of 15,000 Brazilian Portuguese tweets from TV shows, and evaluates baseline machine learning models for sentiment classification.

Contribution

Introduces TweetSentBR, a new sentiment dataset for Brazilian Portuguese, with reliable annotations and baseline classification results.

Findings

01

Achieved 80.99% F-Measure in binary sentiment classification.

02

Reached 59.85% F-Measure in three-class sentiment classification.

03

Demonstrated the dataset's utility for sentiment analysis research.

Abstract

The large amount of data available in social media, forums and websites motivates researches in several areas of Natural Language Processing, such as sentiment analysis. The popularity of the area due to its subjective and semantic characteristics motivates research on novel methods and approaches for classification. Hence, there is a high demand for datasets on different domains and different languages. This paper introduces TweetSentBR, a sentiment corpora for Brazilian Portuguese manually annotated with 15.000 sentences on TV show domain. The sentences were labeled in three classes (positive, neutral and negative) by seven annotators, following literature guidelines for ensuring reliability on the annotation. We also ran baseline experiments on polarity classification using three machine learning methods, reaching 80.99% on F-Measure and 82.06% on accuracy in binary classification,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/HBrum/tweetsentbr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques · Topic Modeling