Secure and Explainable Fraud Detection in Finance via Hierarchical Multi-source Dataset Distillation
Yiming Qian, Thorsten Neumann, Xueyining Huang, David Hardoon, Fei Gao, Yong Liu, Siow Mong Rick Goh

TL;DR
This paper introduces a privacy-preserving, explainable dataset distillation method for financial fraud detection that maintains high accuracy, enhances interpretability, and ensures low risk of data leakage across institutions.
Contribution
It presents a novel hierarchical multi-source dataset distillation framework converting random forests into transparent rule regions with synthetic data generation for collaborative fraud detection.
Findings
Reduces data volume by up to 93% while maintaining accuracy.
Improves cross-institution detection performance with synthetic data.
Ensures low membership inference risk, enhancing privacy.
Abstract
We propose an explainable, privacy-preserving dataset distillation framework for collaborative financial fraud detection. A trained random forest is converted into transparent, axis-aligned rule regions (leaf hyperrectangles), and synthetic transactions are generated by uniformly sampling within each region. This produces a compact, auditable surrogate dataset that preserves local feature interactions without exposing sensitive original records. The rule regions also support explainability: aggregated rule statistics (for example, support and lift) describe global patterns, while assigning each case to its generating region gives concise human-readable rationales and calibrated uncertainty based on tree-vote disagreement. On the IEEE-CIS fraud dataset (590k transactions across three institution-like clusters), distilled datasets reduce data volume by 85% to 93% (often under 15% of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction · Explainable Artificial Intelligence (XAI)
