Minimizing the Societal Cost of Credit Card Fraud with Limited and   Imbalanced Data

Samuel Showalter; Zhixin Wu

arXiv:1909.01486·cs.LG·September 6, 2019·1 cites

Minimizing the Societal Cost of Credit Card Fraud with Limited and Imbalanced Data

Samuel Showalter, Zhixin Wu

PDF

Open Access

TL;DR

This paper investigates methods to reduce the societal cost of credit card fraud detection using machine learning, emphasizing data sampling, algorithm evaluation, and ensemble methods under data imbalance conditions.

Contribution

It compares sampling techniques, evaluates multiple algorithms on cost savings, and explores ensemble optimization, highlighting the importance of cost-based metrics over traditional performance measures.

Findings

01

Monte Carlo simulations show random undersampling outperforms SMOTE in cost reduction.

02

Ensemble models did not outperform individual models in cost efficiency.

03

F-1 Score is uncorrelated with actual cost savings in fraud detection.

Abstract

Machine learning has automated much of financial fraud detection, notifying firms of, or even blocking, questionable transactions instantly. However, data imbalance starves traditionally trained models of the content necessary to detect fraud. This study examines three separate factors of credit card fraud detection via machine learning. First, it assesses the potential for different sampling methods, undersampling and Synthetic Minority Oversampling Technique (SMOTE), to improve algorithm performance in data-starved environments. Additionally, five industry-practical machine learning algorithms are evaluated on total fraud cost savings in addition to traditional statistical metrics. Finally, an ensemble of individual models is trained with a genetic algorithm to attempt to generate higher cost efficiency than its components. Monte Carlo performance distributions discerned random…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Machine Learning and Data Classification

MethodsSynthetic Minority Over-sampling Technique.