TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models
Felipe Oliveira, Victoria Reis, Nelson Ebecken

TL;DR
This paper introduces TuPy-E, the largest annotated Portuguese hate speech dataset, and analyzes various models, including BERT, to improve detection of hate speech in Brazilian social media, addressing language-specific challenges.
Contribution
The paper presents a novel large-scale Portuguese hate speech dataset and provides a comprehensive analysis of models, especially BERT, for effective detection in complex linguistic contexts.
Findings
TuPy-E is the largest annotated Portuguese hate speech dataset.
BERT models outperform traditional methods in hate speech detection.
Open-source approach encourages community collaboration.
Abstract
Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Attention Dropout · Dense Connections · Weight Decay · WordPiece · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay
