SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification
Sai Muralidhar Jayanthi, Akshat Gupta

TL;DR
This paper introduces an ensemble approach using task-adaptive pre-training of multilingual BERT models for offensive language identification in Dravidian languages, achieving top rankings in the shared task.
Contribution
It demonstrates the effectiveness of task-adaptive pre-training combined with ensemble methods for multilingual offensive language detection.
Findings
Ranked 1st for Kannada
Ranked 2nd for Malayalam
Ranked 3rd for Tamil
Abstract
In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective. Our system was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Natural Language Processing Techniques · Interpreting and Communication in Healthcare
MethodsLinear Layer · mBERT · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Attention Is All You Need · Dense Connections · Residual Connection · WordPiece · Attention Dropout · Adam
