Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi
Anna Glazkova, Michael Kadantsev, Maksim Glazkov

TL;DR
This paper develops neural transformer-based models for detecting hate, offensive, and profane content in English and Marathi, achieving competitive results in shared tasks through fine-tuning and language-agnostic embeddings.
Contribution
It introduces a fine-tuning approach for transformers on multilingual hate speech detection and applies language-agnostic embeddings for Marathi content classification.
Findings
English models achieved up to 81.99% F1-score
Marathi model achieved 88.08% F1-score
Transformer fine-tuning improved hate speech detection performance
Abstract
This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B) and one task on identification of problematic content in Marathi (Marathi Subtask A). For English subtasks, we investigate the impact of additional corpora for hate speech detection to fine-tune transformer models. We also apply a one-vs-rest approach based on Twitter-RoBERTa to discrimination between hate, profane and offensive posts. Our models ranked third in English Subtask A with the F1-score of 81.99% and ranked second in English Subtask B with the F1-score of 65.77%. For the Marathi tasks, we propose a system based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Residual Connection · WordPiece · Dense Connections · Linear Warmup With Linear Decay · Weight Decay
