Using Machine Learning to Enhance the Detection of Obfuscated Abusive Words in Swahili: A Focus on Child Safety

Phyllis Nabangi; Abdul-Jalil Zakaria; Jema David Ndibwile

arXiv:2602.13455·cs.CL·February 17, 2026

Using Machine Learning to Enhance the Detection of Obfuscated Abusive Words in Swahili: A Focus on Child Safety

Phyllis Nabangi, Abdul-Jalil Zakaria, Jema David Ndibwile

PDF

Open Access

TL;DR

This paper explores machine learning techniques to detect obfuscated abusive language in Swahili online content, aiming to improve child safety by addressing linguistic challenges in a low-resource language.

Contribution

It introduces the application of ML models like SVM, Logistic Regression, and Decision Trees to Swahili abuse detection, emphasizing data balancing and model optimization.

Findings

01

Models perform well with high-dimensional data

02

Data imbalance limits generalizability

03

Performance varies across models

Abstract

The rise of digital technology has dramatically increased the potential for cyberbullying and online abuse, necessitating enhanced measures for detection and prevention, especially among children. This study focuses on detecting abusive obfuscated language in Swahili, a low-resource language that poses unique challenges due to its limited linguistic resources and technological support. Swahili is chosen due to its popularity and being the most widely spoken language in Africa, with over 16 million native speakers and upwards of 100 million speakers in total, spanning regions in East Africa and some parts of the Middle East. We employed machine learning models including Support Vector Machines (SVM), Logistic Regression, and Decision Trees, optimized through rigorous parameter tuning and techniques like Synthetic Minority Over-sampling Technique (SMOTE) to handle data imbalance. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Bullying, Victimization, and Aggression