Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane   Content Detection in English and Marathi

Anna Glazkova; Michael Kadantsev; Maksim Glazkov

arXiv:2110.12687·cs.CL·October 18, 2022·6 cites

Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

Anna Glazkova, Michael Kadantsev, Maksim Glazkov

PDF

Open Access 1 Repo

TL;DR

This paper develops neural transformer-based models for detecting hate, offensive, and profane content in English and Marathi, achieving competitive results in shared tasks through fine-tuning and language-agnostic embeddings.

Contribution

It introduces a fine-tuning approach for transformers on multilingual hate speech detection and applies language-agnostic embeddings for Marathi content classification.

Findings

01

English models achieved up to 81.99% F1-score

02

Marathi model achieved 88.08% F1-score

03

Transformer fine-tuning improved hate speech detection performance

Abstract

This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B) and one task on identification of problematic content in Marathi (Marathi Subtask A). For English subtasks, we investigate the impact of additional corpora for hate speech detection to fine-tune transformer models. We also apply a one-vs-rest approach based on Twitter-RoBERTa to discrimination between hate, profane and offensive posts. Our models ranked third in English Subtask A with the F1-score of 81.99% and ranked second in English Subtask B with the F1-score of 65.77%. For the Marathi tasks, we propose a system based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ixomaxip/hasoc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Residual Connection · WordPiece · Dense Connections · Linear Warmup With Linear Decay · Weight Decay