Vulgar Remarks Detection in Chittagonian Dialect of Bangla
Tanjim Mahmud, Michal Ptaszynski, Fumito Masui

TL;DR
This paper develops machine learning models to automatically detect vulgar remarks in the Chittagonian dialect of Bangla, a low-resource language, demonstrating promising accuracy with traditional ML methods and exploring deep learning limitations.
Contribution
It introduces the first ML-based approach for vulgar remark detection in Chittagonian Bangla and compares traditional and deep learning methods in this low-resource context.
Findings
Logistic Regression achieved 0.91 accuracy.
Simple RNN with Word2vec and fastText achieved 0.84-0.90 accuracy.
Neural network algorithms require more data for effective performance.
Abstract
The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media. One solution is using natural language processing (NLP) and machine learning (ML) methods for the automatic detection of harmful remarks, but these methods are limited in low-resource languages like the Chittagonian dialect of Bangla.This study focuses on detecting vulgar remarks in social media using supervised ML and deep learning algorithms.Logistic Regression achieved promising accuracy (0.91) while simple RNN with Word2vec and fastTex had lower accuracy (0.84-0.90), highlighting the issue that NN algorithms require more data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
