muBoost: An Effective Method for Solving Indic Multilingual Text Classification Problem
Manish Pathak, Aditya Jain

TL;DR
This paper introduces muBoost, an ensemble method combining CatBoost and MURIL models, achieving state-of-the-art results in multilingual abusive comment detection across 13 Indic languages on the Moj platform.
Contribution
The paper presents muBoost, a novel ensemble approach that improves abusive comment classification accuracy in multiple Indic languages using combined CatBoost and MURIL models.
Findings
Achieved a mean F1-score of 89.286 on test data.
Improved over baseline MURIL model with a F1-score of 87.48.
Demonstrated effectiveness in multilingual abusive comment detection.
Abstract
Text Classification is an integral part of many Natural Language Processing tasks such as sarcasm detection, sentiment analysis and many more such applications. Many e-commerce websites, social-media/entertainment platforms use such models to enhance user-experience to generate traffic and thus, revenue on their platforms. In this paper, we are presenting our solution to Multilingual Abusive Comment Identification Problem on Moj, an Indian video-sharing social networking service, powered by ShareChat. The problem dealt with detecting abusive comments, in 13 regional Indic languages such as Hindi, Telugu, Kannada etc., on the videos on Moj platform. Our solution utilizes the novel muBoost, an ensemble of CatBoost classifier models and Multilingual Representations for Indian Languages (MURIL) model, to produce SOTA performance on Indic text classification tasks. We were able to achieve a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection
Methodstravel james · Test
