Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data
Gayan K. Kulatilleke, Sugandika Samarakoon

TL;DR
This paper empirically investigates how different classifier evaluation metrics behave in highly imbalanced and noisy credit card fraud detection datasets, proposing a combined F1 and g-mean metric as most effective.
Contribution
It develops a theoretical model of human annotation errors and imbalance, and empirically evaluates evaluation metrics to identify the most reliable one for fraud detection.
Findings
Combined F1 score and g-mean outperform other metrics
Simulation of human annotation errors impacts metric performance
Empirical results support using F1 and g-mean for imbalanced data
Abstract
With growing credit card transaction volumes, the fraud percentages are also rising, including overhead costs for institutions to combat and compensate victims. The use of machine learning into the financial sector permits more effective protection against fraud and other economic crime. Suitably trained machine learning classifiers help proactive fraud detection, improving stakeholder trust and robustness against illicit transactions. However, the design of machine learning based fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data and the challenges of identifying the frauds accurately and completely to create a gold standard ground truth. Furthermore, there are no benchmarks or standard classifier evaluation metrics to measure and identify better performing classifiers, thus keeping researchers in the dark. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Electricity Theft Detection Techniques
