Using Machine Learning to Detect Fraudulent SMSs in Chichewa

Amelia Taylor; Amoss Robert

arXiv:2502.16947·cs.LG·February 25, 2025

Using Machine Learning to Detect Fraudulent SMSs in Chichewa

Amelia Taylor, Amoss Robert

PDF

Open Access

TL;DR

This study introduces the first dataset for SMS fraud detection in Chichewa, demonstrating that machine learning models can achieve high accuracy, but performance drops with translation, emphasizing the need for language-specific models.

Contribution

The paper creates and evaluates the first Chichewa SMS fraud dataset and assesses the feasibility of machine learning models for this language, highlighting challenges in multilingual NLP.

Findings

01

Models achieved over 96% accuracy on Chichewa data

02

Performance declined when using translated datasets

03

Data preprocessing impacts model effectiveness in multilingual settings

Abstract

SMS enabled fraud is of great concern globally. Building classifiers based on machine learning for SMS fraud requires the use of suitable datasets for model training and validation. Most research has centred on the use of datasets of SMSs in English. This paper introduces a first dataset for SMS fraud detection in Chichewa, a major language in Africa, and reports on experiments with machine learning algorithms for classifying SMSs in Chichewa as fraud or non-fraud. We answer the broader research question of how feasible it is to develop machine learning classification models for Chichewa SMSs. To do that, we created three datasets. A small dataset of SMS in Chichewa was collected through primary research from a segment of the young population. We applied a label-preserving text transformations to increase its size. The enlarged dataset was translated into English using two approaches:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Cybercrime and Law Enforcement Studies