Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift
Adam Arthur, Christopher Schwartz

TL;DR
This paper demonstrates that off-the-shelf image diffusion models can be repurposed to generate adversarial synthetic structured data, revealing new vulnerabilities in AI data pipelines and distinctions between statistical and perceptual realism.
Contribution
It shows how a standard image diffusion model can be adapted to create adversarial structured data and introduces conceptual distinctions relevant to synthetic evidence and realism.
Findings
Diffusion models can be reshaped to generate structured data from tabular datasets.
Synthetic evidence can induce ground truth drift in AI pipelines.
The architecture's spatial bias influences feature placement in generated data.
Abstract
Public image diffusion models are now powerful enough that an attacker without the resources to train a tabular-specific generator may repurpose one off the shelf. This study tests that possibility directly. An unmodified Stable Diffusion U-Net is applied to the UCI Adult Income dataset by reshaping each row into a small single-channel pseudo-image. The architecture's inductive bias toward spatial locality makes feature placement a design variable, and several layouts are tested. However, this is only the beginning of the story, as this paper also draws two philosophical distinctions. One separates statistical from perceptual realism: whether synthetic content holds up to a machine's correlation audits or a human's sensory inspection. The other introduces synthetic evidence as a category alongside synthetic media: AI-generated material whose consumer is a machine in a closed evidentiary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
