IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages
Shankar Biradar, Sunil Saumya

TL;DR
This paper presents a multilingual BERT-based model for detecting offensive content in South Indian languages, specifically Malayalam and Tamil code-mixed sentences, achieving competitive F1 scores in a shared task.
Contribution
The study applies mBERT with classifiers to offensive content detection in Malayalam and Tamil code-mixed data, addressing a challenging multilingual problem.
Findings
Weighted F1 score of 0.70 for Malayalam data
Weighted F1 score of 0.573 for Tamil code-mixed data
Ranked fifth and eleventh in respective tasks
Abstract
In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. This created a greater need to address this issue than ever before. To address these issues, the organizers of "Dravidian-Code Mixed HASOC-2020" have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. In our suggested model, we experiment with multilingual BERT to extract features, and three different classifiers are used on extracted features. Our model received a weighted F1 score of 0.70 for Malayalam data and was ranked fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data and were ranked eleventh.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Adam · Multi-Head Attention · Residual Connection · Dense Connections · Attention Dropout · Softmax · Linear Warmup With Linear Decay
