A Text-to-Text Model for Multilingual Offensive Language Identification
Tharindu Ranasinghe, Marcos Zampieri

TL;DR
This paper introduces a novel encoder-decoder T5 model for multilingual offensive language detection, outperforming existing models on various benchmarks and enabling effective multilingual offensive content identification.
Contribution
It presents the first encoder-decoder T5-based model for offensive language detection, trained on large datasets, and demonstrates superior performance over existing models in multiple languages.
Findings
T5 model outperforms BERT and XLNet in offensive language detection.
Multilingual mT5 achieves state-of-the-art results across six languages.
The models are publicly available for community use.
Abstract
The ubiquity of offensive content on social media is a growing cause for concern among companies and government organizations. Recently, transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance in detecting various forms of offensive content (e.g. hate speech, cyberbullying, and cyberaggression). However, the majority of these models are limited in their capabilities due to their encoder-only architecture, which restricts the number and types of labels in downstream tasks. Addressing these limitations, this study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5) trained on two large offensive language identification datasets; SOLID and CCTK. We investigate the effectiveness of combining two datasets and selecting an optimal threshold in semi-supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsSparse Evolutionary Training · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Byte Pair Encoding · WordPiece · Gated Linear Unit · Dropout · Attention Dropout · Weight Decay
