Minimizing the Societal Cost of Credit Card Fraud with Limited and Imbalanced Data
Samuel Showalter, Zhixin Wu

TL;DR
This paper investigates methods to reduce the societal cost of credit card fraud detection using machine learning, emphasizing data sampling, algorithm evaluation, and ensemble methods under data imbalance conditions.
Contribution
It compares sampling techniques, evaluates multiple algorithms on cost savings, and explores ensemble optimization, highlighting the importance of cost-based metrics over traditional performance measures.
Findings
Monte Carlo simulations show random undersampling outperforms SMOTE in cost reduction.
Ensemble models did not outperform individual models in cost efficiency.
F-1 Score is uncorrelated with actual cost savings in fraud detection.
Abstract
Machine learning has automated much of financial fraud detection, notifying firms of, or even blocking, questionable transactions instantly. However, data imbalance starves traditionally trained models of the content necessary to detect fraud. This study examines three separate factors of credit card fraud detection via machine learning. First, it assesses the potential for different sampling methods, undersampling and Synthetic Minority Oversampling Technique (SMOTE), to improve algorithm performance in data-starved environments. Additionally, five industry-practical machine learning algorithms are evaluated on total fraud cost savings in addition to traditional statistical metrics. Finally, an ensemble of individual models is trained with a genetic algorithm to attempt to generate higher cost efficiency than its components. Monte Carlo performance distributions discerned random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Machine Learning and Data Classification
MethodsSynthetic Minority Over-sampling Technique.
