Robust Detection of Synthetic Tabular Data under Schema Variability
G. Charbel N. Kindji (MALT), Elisa Fromont (MALT), Lina Maria Rojas-Barahona, Tanguy Urvoy

TL;DR
This paper presents a new transformer-based method for detecting synthetic tabular data under real-world conditions with variable schemas, significantly outperforming previous approaches and demonstrating robustness and feasibility.
Contribution
Introduces a novel datum-wise transformer architecture with table-adaptation for robust detection of synthetic tabular data in unseen schemas.
Findings
Outperforms previous baseline by 7 points in AUC and accuracy
Table-adaptation component adds 7 accuracy points
Proves detection of synthetic tabular data is feasible in real-world scenarios
Abstract
The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data ''in the wild'', i.e. when the detector is deployed on tables with variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Bayesian Modeling and Causal Inference · Advanced Statistical Methods and Models
