Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion

Eugenio Lomurno; Filippo Balzarini; Francesco Benelle; Francesca Pia Panaccione; Matteo Matteucci

arXiv:2605.06261·cs.LG·May 8, 2026

Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion

Eugenio Lomurno, Filippo Balzarini, Francesco Benelle, Francesca Pia Panaccione, Matteo Matteucci

PDF

TL;DR

This paper introduces TARDIS, an inference-time refinement framework for tabular data synthesis that improves synthetic data utility by refining pre-trained diffusion models without retraining.

Contribution

It proposes a novel inference-time refinement method using a Tree-structured Parzen Estimator search and Bidirectional Chamfer Refinement pattern, enhancing synthetic data quality.

Findings

01

TARDIS achieves a median +8.6% improvement in downstream tasks over real data models.

02

It improves over the baseline TabDiff on all 15 datasets with a mean +12.9% gain.

03

Inference-time refinement reaches or exceeds real-data utility within 1 to 80 minutes on a consumer GPU.

Abstract

Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training time, via architectural advances, scaling, and retraining of monolithic generators. The inference-time alternative, i.e., refining the outputs of a pre-trained backbone with parameters left untouched, has remained largely unexplored for tabular synthesis. We introduce TARDIS (Tabular generation through Refinement, Distillation, and Inference-time Sampling), an inference-time refinement framework that operates on a frozen pre-trained backbone, configured per dataset by a Tree-structured Parzen Estimator search over score-level guidance during reverse diffusion, with each trial's objective set by an inner grid search over post-hoc sample selectors and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.