Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals

Bhavana Sajja

arXiv:2604.13125·cs.LG·April 16, 2026

Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals

Bhavana Sajja

PDF

TL;DR

This paper introduces behavioral fidelity as a new evaluation dimension for synthetic tabular data, demonstrating that existing generators fail to preserve critical fraud detection signals across multiple datasets.

Contribution

It formalizes a taxonomy of behavioral fraud patterns, proves limitations of row-independent generators, and benchmarks several models showing significant failure to reproduce key behavioral signals.

Findings

01

All tested generators fail to preserve behavioral fraud patterns, with degradation ratios up to 39x.

02

Row-independent generators cannot reproduce multi-account graph motifs or burst patterns.

03

The proposed evaluation framework is open source and applicable across domains with entity-level sequential data.

Abstract

We introduce behavioral fidelity -- a third evaluation dimension for synthetic tabular data that measures whether generated data preserves the temporal, sequential, and structural behavioral patterns that distinguish real-world entity activity. Existing frameworks evaluate statistical fidelity (marginal distributions and correlations) and downstream utility (classifier AUROC on synthetic-trained models), but neither tests for the behavioral signals that operational detection and analysis systems actually rely on. We formalize a taxonomy of four behavioral fraud patterns (P1-P4) covering inter-event timing, burst structure, multi-account graph motifs, and velocity-rule trigger rates; define a degradation ratio metric calibrated to a real-data noise floor (1.0 = matches real variability, k = k-times worse); and prove that row-independent generators -- the dominant paradigm -- are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.