FiFAR: A Fraud Detection Dataset for Learning to Defer

Jean V. Alves; Diogo Leit\~ao; S\'ergio Jesus; Marco O. P. Sampaio,; Pedro Saleiro; M\'ario A. T. Figueiredo; Pedro Bizarro

arXiv:2312.13218·cs.LG·December 21, 2023·1 cites

FiFAR: A Fraud Detection Dataset for Learning to Defer

Jean V. Alves, Diogo Leit\~ao, S\'ergio Jesus, Marco O. P. Sampaio,, Pedro Saleiro, M\'ario A. T. Figueiredo, Pedro Bizarro

PDF

Open Access 1 Repo

TL;DR

The paper introduces FiFAR, a synthetic dataset for learning to defer in fraud detection, enabling benchmarking of human-AI collaboration methods under realistic constraints.

Contribution

It provides the first publicly available synthetic fraud detection dataset with human expert predictions and capacity constraints for L2D research.

Findings

01

Developed a capacity-aware L2D method.

02

Benchmarking under 300 scenarios shows effectiveness.

03

Dataset facilitates systematic evaluation of human-AI teaming.

Abstract

Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud detection is a high-stakes setting where algorithms and human experts often work in tandem; however, there are no publicly available datasets for L2D concerning this important application of human-AI teaming. To fill this gap in L2D research, we introduce the Financial Fraud Alert Review Dataset (FiFAR), a synthetic bank account fraud detection dataset, containing the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

feedzai/fifar-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Ethics and Social Impacts of AI · Artificial Intelligence in Law