AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in   Immigration-Related Web News Comments Using Transformers and Statistical   Models

Angel Felipe Magnoss\~ao de Paula; Ipek Baris Schlicht

arXiv:2111.04530·cs.CL·November 9, 2021

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

Angel Felipe Magnoss\~ao de Paula, Ipek Baris Schlicht

PDF

Open Access 1 Repo

TL;DR

This paper presents a system that uses BERT-based models to detect toxicity and xenophobia in Spanish online news comments, achieving top rankings in the DETOXIS shared task and demonstrating the superiority of monolingual BERT models over statistical methods.

Contribution

The study introduces the application of BERT models, especially BETO, for toxicity detection in Spanish comments, outperforming traditional statistical models and establishing the effectiveness of monolingual BERT.

Findings

01

BERT models outperform statistical models in toxicity detection.

02

BETO achieved 3rd place with an F1-score of 0.5996.

03

Monolingual BERT models have an advantage over multilingual ones.

Abstract

This paper describes our participation in the DEtection of TOXicity in comments In Spanish (DETOXIS) shared task 2021 at the 3rd Workshop on Iberian Languages Evaluation Forum. The shared task is divided into two related classification tasks: (i) Task 1: toxicity detection and; (ii) Task 2: toxicity level detection. They focus on the xenophobic problem exacerbated by the spread of toxic comments posted in different online news articles related to immigration. One of the necessary efforts towards mitigating this problem is to detect toxicity in the comments. Our main objective was to implement an accurate model to detect xenophobia in comments about web news articles within the DETOXIS shared task 2021, based on the competition's official metrics: the F1-score for Task 1 and the Closeness Evaluation Metric (CEM) for Task 2. To solve the tasks, we worked with two types of machine learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

angelfelipemp/machine-learning-tweets-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · WordPiece · Dropout · Weight Decay · Residual Connection · Dense Connections