Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective
Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng

TL;DR
This paper improves the quality and usefulness of synthetic credit card transaction time series data by introducing new preprocessing methods and evaluating their impact on fraud detection models, providing practical guidelines for finance applications.
Contribution
It proposes five preprocessing schemas to enhance training of a conditional auto-regressive model for synthetic data generation and evaluates their effectiveness in real-world fraud detection tasks.
Findings
Incremental improvements in data fidelity and utility with new preprocessing schemas
Synthetic data can effectively train fraud detection models comparable to real data
Guidelines for transitioning from real to synthetic datasets in finance applications
Abstract
Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges. This paper addresses these challenges, focusing on attaining both high fidelity to actual data and optimal utility for machine learning tasks. We introduce five pre-processing schemas to enhance the training of the Conditional Probabilistic Auto-Regressive Model (CPAR), demonstrating incremental improvements in the synthetic data's fidelity and utility. Upon achieving satisfactory fidelity levels, our attention shifts to training fraud detection models tailored for time-series data, evaluating the utility of the synthetic data. Our findings offer valuable insights and practical guidelines for synthetic data practitioners in the finance sector, transitioning from real to synthetic datasets for training purposes, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Artificial Intelligence in Law · Financial Distress and Bankruptcy Prediction
