Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study
Mohammad Shihab Uddin, Md Hasibul Amin, Nusrat Jahan Ema, Bushra Uddin, Tanvir Ahmed, and Arif Hassan Zidan

TL;DR
This study explores multilingual financial fraud detection in Bangla-English messages, demonstrating that classical machine learning models can outperform transformer architectures in accuracy and F1 score, despite linguistic challenges.
Contribution
It introduces a multilingual fraud detection approach for Bangla-English data, comparing classical ML and transformer models, and provides insights into linguistic patterns of fraudulent messages.
Findings
Linear SVM achieved 91.59% accuracy and 91.30% F1 score.
Transformer models had higher fraud recall but more false positives.
Classical ML models remain competitive in multilingual fraud detection.
Abstract
Financial fraud detection has emerged as a critical research challenge amid the rapid expansion of digital financial platforms. Although machine learning approaches have demonstrated strong performance in identifying fraudulent activities, most existing research focuses exclusively on English-language data, limiting applicability to multilingual contexts. Bangla (Bengali), despite being spoken by over 250 million people, remains largely unexplored in this domain. In this work, we investigate financial fraud detection in a multilingual Bangla-English setting using a dataset comprising legitimate and fraudulent financial messages. We evaluate classical machine learning models (Logistic Regression, Linear SVM, and Ensemble classifiers) using TF-IDF features alongside transformer-based architectures. Experimental results using 5-fold stratified cross-validation demonstrate that Linear SVM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Spam and Phishing Detection · Financial Distress and Bankruptcy Prediction
