Vulgar Remarks Detection in Chittagonian Dialect of Bangla

Tanjim Mahmud; Michal Ptaszynski; Fumito Masui

arXiv:2308.15448·cs.CL·August 30, 2023

Vulgar Remarks Detection in Chittagonian Dialect of Bangla

Tanjim Mahmud, Michal Ptaszynski, Fumito Masui

PDF

TL;DR

This paper develops machine learning models to automatically detect vulgar remarks in the Chittagonian dialect of Bangla, a low-resource language, demonstrating promising accuracy with traditional ML methods and exploring deep learning limitations.

Contribution

It introduces the first ML-based approach for vulgar remark detection in Chittagonian Bangla and compares traditional and deep learning methods in this low-resource context.

Findings

01

Logistic Regression achieved 0.91 accuracy.

02

Simple RNN with Word2vec and fastText achieved 0.84-0.90 accuracy.

03

Neural network algorithms require more data for effective performance.

Abstract

The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media. One solution is using natural language processing (NLP) and machine learning (ML) methods for the automatic detection of harmful remarks, but these methods are limited in low-resource languages like the Chittagonian dialect of Bangla.This study focuses on detecting vulgar remarks in social media using supervised ML and deep learning algorithms.Logistic Regression achieved promising accuracy (0.91) while simple RNN with Word2vec and fastTex had lower accuracy (0.84-0.90), highlighting the issue that NN algorithms require more data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.