Fraud Dataset Benchmark and Applications
Prince Grover, Julia Xu, Justin Tittelfitz, Anqi Cheng, Zheng Li,, Jakub Zablocki, Jianbo Liu, Hao Zhou

TL;DR
This paper introduces the Fraud Dataset Benchmark (FDB), a comprehensive collection of datasets and tools designed to advance research and development in fraud detection by addressing its unique challenges.
Contribution
The paper presents FDB, a standardized benchmark with datasets and a library, tailored for diverse fraud detection tasks and challenges, facilitating consistent evaluation and innovation.
Findings
Demonstrated applications include feature engineering and algorithm comparison.
Showcased methods for class imbalance and label noise handling.
Provided a unified platform for fraud detection research.
Abstract
Standardized datasets and benchmarks have spurred innovations in computer vision, natural language processing, multi-modal and tabular settings. We note that, as compared to other well researched fields, fraud detection has unique challenges: high-class imbalance, diverse feature types, frequently changing fraud patterns, and adversarial nature of the problem. Due to these, the modeling approaches evaluated on datasets from other research fields may not work well for the fraud detection. In this paper, we introduce Fraud Dataset Benchmark (FDB), a compilation of publicly available datasets catered to fraud detection FDB comprises variety of fraud related tasks, ranging from identifying fraudulent card-not-present transactions, detecting bot attacks, classifying malicious URLs, estimating risk of loan default to content moderation. The Python based library for FDB provides a consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
MethodsLib
