Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset
Fran\c{c}ois de la Bourdonnaye, Fabrice Daniel

TL;DR
This study evaluates various resampling techniques on a large, real-world imbalanced credit card fraud dataset, finding that most methods are ineffective or computationally infeasible, highlighting challenges in real-life fraud detection.
Contribution
The paper compares numerous state-of-the-art resampling methods on a large-scale, real-world online credit card payments dataset, revealing their limitations in practical scenarios.
Findings
Most resampling methods are inefficient or intractable on large datasets.
Resampling methods do not significantly improve detection metrics.
Real-life datasets pose unique challenges for imbalanced data handling.
Abstract
Various problems of any credit card fraud detection based on machine learning come from the imbalanced aspect of transaction datasets. Indeed, the number of frauds compared to the number of regular transactions is tiny and has been shown to damage learning performances, e.g., at worst, the algorithm can learn to classify all the transactions as regular. Resampling methods and cost-sensitive approaches are known to be good candidates to leverage this issue of imbalanced datasets. This paper evaluates numerous state-of-the-art resampling methods on a large real-life online credit card payments dataset. We show they are inefficient because methods are intractable or because metrics do not exhibit substantial improvements. Our work contributes to this domain in (1) that we compare many state-of-the-art resampling methods on a large-scale dataset and in (2) that we use a real-life online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Machine Learning and Algorithms
