Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing
Manikandan Ravikiran, Amin Ekant Muljibhai, Toshinori Miyoshi, Hiroaki, Ozaki, Yuta Koreeda, Sakata Masayuki

TL;DR
This paper describes a hybrid BERT-based system for offensive language detection in noisy Twitter data, utilizing statistical sampling and post-processing, achieving high F1 scores and providing insights for future research.
Contribution
The paper introduces a novel combination of statistical sampling and post-processing techniques with BERT for noisy label offensive language detection.
Findings
Achieved Macro-F1 score of 0.90913 on SemEval-2020 Task-12 dataset.
Developed a hybrid system combining sampling, BERT, and post-processing.
Provided detailed error analysis to guide future research.
Abstract
In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
