TL;DR
This paper explores offensive language detection in Tamil code-mixed YouTube comments using deep learning and transfer learning models, proposing novel translation techniques and identifying ULMFiT as the most effective model for low-resource language content.
Contribution
It introduces a novel approach combining selective translation and transliteration with multilingual transformers, and identifies ULMFiT as the best model for Tamil code-mixed offensive language detection.
Findings
ULMFiT outperforms other models in accuracy.
mBERTBiLSTM also shows strong performance.
Proposed techniques improve results in low-resource language tasks.
Abstract
Offensive Language detection in social media platforms has been an active field of research over the past years. In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments. This poses several challenges in the offensive content identification tasks, and considering the low resources available for Tamil, the task becomes much harder. The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube. We propose a novel and flexible approach of selective translation and transliteration techniques to reap better results from fine-tuning and ensembling multilingual transformer networks like BERT, Distil- BERT, and XLM-RoBERTa. The experimental results showed that ULMFiT is the best model for this task. The best performing models were ULMFiT and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Sigmoid Activation · Tanh Activation · Attention Dropout · Long Short-Term Memory · Variational Dropout · Dense Connections · Dropout · Weight Decay
