Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive   Content Identification in Indo-European Languages

Thomas Mandla; Sandip Modha; Gautam Kishore Shahi; Amit Kumar Jaiswal,; Durgesh Nandini; Daksh Patel; Prasenjit Majumder; Johannes Sch\"afer

arXiv:2108.05927·cs.CL·August 16, 2021

Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages

Thomas Mandla, Sandip Modha, Gautam Kishore Shahi, Amit Kumar Jaiswal,, Durgesh Nandini, Daksh Patel, Prasenjit Majumder, Johannes Sch\"afer

PDF

Open Access

TL;DR

This paper reviews the HASOC track at FIRE 2020, which focused on developing multilingual hate speech detection algorithms for English, Hindi, and German using Twitter data, with transformer models like BERT showing strong performance.

Contribution

It introduces a multilingual hate speech detection benchmark with datasets and tasks for English, Hindi, and German, and evaluates various algorithms including transformer-based models.

Findings

01

Best F1 scores around 0.52 for binary classification

02

Transformer models like BERT performed best

03

Multilingual datasets enable cross-language hate speech detection

Abstract

With the growth of social media, the spread of hate speech is also increasing rapidly. Social media are widely used in many countries. Also Hate Speech is spreading in these countries. This brings a need for multilingual Hate Speech detection algorithms. Much research in this area is dedicated to English at the moment. The HASOC track intends to provide a platform to develop and optimize Hate Speech detection algorithms for Hindi, German and English. The dataset is collected from a Twitter archive and pre-classified by a machine learning system. HASOC has two sub-task for all three languages: task A is a binary classification problem (Hate and Not Offensive) while task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY. Overall, 252 runs were submitted by 40 teams. The performance of the best classification algorithms for task A are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Attention Dropout · Dropout · Residual Connection · Adam · Multi-Head Attention · Dense Connections