AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models
Angel Felipe Magnoss\~ao de Paula, Ipek Baris Schlicht

TL;DR
This paper presents a system that uses BERT-based models to detect toxicity and xenophobia in Spanish online news comments, achieving top rankings in the DETOXIS shared task and demonstrating the superiority of monolingual BERT models over statistical methods.
Contribution
The study introduces the application of BERT models, especially BETO, for toxicity detection in Spanish comments, outperforming traditional statistical models and establishing the effectiveness of monolingual BERT.
Findings
BERT models outperform statistical models in toxicity detection.
BETO achieved 3rd place with an F1-score of 0.5996.
Monolingual BERT models have an advantage over multilingual ones.
Abstract
This paper describes our participation in the DEtection of TOXicity in comments In Spanish (DETOXIS) shared task 2021 at the 3rd Workshop on Iberian Languages Evaluation Forum. The shared task is divided into two related classification tasks: (i) Task 1: toxicity detection and; (ii) Task 2: toxicity level detection. They focus on the xenophobic problem exacerbated by the spread of toxic comments posted in different online news articles related to immigration. One of the necessary efforts towards mitigating this problem is to detect toxicity in the comments. Our main objective was to implement an accurate model to detect xenophobia in comments about web news articles within the DETOXIS shared task 2021, based on the competition's official metrics: the F1-score for Task 1 and the Closeness Evaluation Metric (CEM) for Task 2. To solve the tasks, we worked with two types of machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · WordPiece · Dropout · Weight Decay · Residual Connection · Dense Connections
