Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive   Language Identification in Under-Resourced Dravidian Languages

Adeep Hande; Siddhanth U Hegde; Ruba Priyadharshini; Rahul; Ponnusamy; Prasanna Kumar Kumaresan; Sajeetha Thavareesan; Bharathi; Raja Chakravarthi

arXiv:2108.03867·cs.CC·August 10, 2021

Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages

Adeep Hande, Siddhanth U Hegde, Ruba Priyadharshini, Rahul, Ponnusamy, Prasanna Kumar Kumaresan, Sajeetha Thavareesan, Bharathi, Raja Chakravarthi

PDF

Open Access 1 Repo

TL;DR

This paper evaluates multi-task learning models for sentiment analysis and offensive language detection in under-resourced Dravidian languages, demonstrating improved performance and efficiency over single-task models on code-mixed YouTube comments.

Contribution

It introduces a multi-task learning framework for under-resourced Dravidian languages, showing its effectiveness and providing benchmark results for these tasks.

Findings

01

Multi-task learning outperforms single-task models in accuracy.

02

Best models achieved high weighted F1-scores for all three languages.

03

The framework is adaptable to other sequence classification problems.

Abstract

To obtain extensive annotated data for under-resourced languages is challenging, so in this research, we have investigated whether it is beneficial to train models using multi-task learning. Sentiment analysis and offensive language identification share similar discourse properties. The selection of these tasks is motivated by the lack of large labelled data for user-generated code-mixed datasets. This paper works on code-mixed YouTube comments for Tamil, Malayalam, and Kannada languages. Our framework is applicable to other sequence classification problems irrespective of the size of the datasets. Experiments show that our multi-task learning model can achieve high results compared with single-task learning while reducing the time and space constraints required to train the models on individual tasks. Analysis of fine-tuned models indicates the preference of multi-task learning over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

siddhanthhegde/dravidian-mtl-benchmarking
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection