Cross-table Synthetic Tabular Data Detection

G. Charbel N. Kindji (LACODAM); Lina Maria Rojas-Barahona; Elisa; Fromont (LACODAM); Tanguy Urvoy

arXiv:2412.13227·cs.LG·December 19, 2024

Cross-table Synthetic Tabular Data Detection

G. Charbel N. Kindji (LACODAM), Lina Maria Rojas-Barahona, Elisa, Fromont (LACODAM), Tanguy Urvoy

PDF

Open Access

TL;DR

This paper investigates the challenge of detecting synthetic tabular data across diverse datasets and generators, proposing baseline methods and evaluation protocols to assess the difficulty of cross-table detection in real-world scenarios.

Contribution

It introduces three baseline detectors and four evaluation protocols to study the problem of cross-table synthetic data detection in varied and realistic settings.

Findings

01

Cross-table detection remains a challenging task.

02

Baseline detectors show limited effectiveness in 'wild' scenarios.

03

Evaluation protocols highlight the variability in detection difficulty.

Abstract

Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified ''in the wild''-meaning across different generators, domains, and table formats. This challenge is unique to tabular data, where structures (such as number of columns, data types, and formats) can vary widely from one table to another. We propose three cross-table baseline detectors and four distinct evaluation protocols, each corresponding to a different level of ''wildness''. Our very preliminary results confirm that cross-table adaptation is a challenging task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Currency Recognition and Detection