Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection Models
Yinan Cheng, Chi-Hua Wang, Vamsi K. Potluru, Tucker Balch, Guang Cheng

TL;DR
This paper investigates how to select the most suitable generative models for synthetic data in fraud detection, emphasizing interpretability constraints and providing practical guidance for replacing real datasets.
Contribution
It offers a comparative analysis of Neural Network and Bayesian Network generative models for synthetic data in fraud detection, considering interpretability and performance constraints.
Findings
BN-based models outperform NN-based models under strict interpretability constraints.
Both NN and BN models are effective under loose interpretability constraints.
Guidelines are provided for practitioners on selecting generative models for synthetic training data.
Abstract
Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model class and performance metric. In this paper, we approach the downstream task-oriented generative model selections problem in the case of training fraud detection models and investigate the best practice given different combinations of model interpretability and model performance constraints. Our investigation supports that, while both Neural Network(NN)-based and Bayesian Network(BN)-based generative models are both good to complete synthetic training task under loose model interpretability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
