Structured Evaluation of Synthetic Tabular Data
Scott Cheng-Hsin Yang, Baxter Eaves, Michael Schmidt, Ken Swanson, and, Patrick Shafto

TL;DR
This paper introduces a unified evaluation framework for synthetic tabular data based on the principle that generated data should match the distribution of real data, enabling better assessment of data quality and generator performance.
Contribution
It proposes a coherent, mathematically grounded evaluation framework that unifies existing metrics and introduces new model-free baselines for synthetic tabular data quality assessment.
Findings
Structured synthesizers outperform deep learning methods on small datasets
The framework reveals the completeness and limitations of existing metrics
Model-based and model-free metrics are unified under a single objective
Abstract
Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics. To address this issue, we propose an evaluation framework with a single, mathematical objective that posits that the synthetic data should be drawn from the same distribution as the observed data. Through various structural decomposition of the objective, this framework allows us to reason for the first time the completeness of any set of metrics, as well as unifies existing metrics, including those that stem from fidelity considerations, downstream application, and model-based approaches. Moreover, the framework motivates model-free baselines and a new spectrum of metrics. We…
Peer Reviews
Decision·Submitted to ICLR 2024
The main strength of the paper is that such work is very much needed. Additionally, I like the idea of arranging the metrics according to a spectrum that highlights the complexity of the relationships among features that a metric is able to capture.
I think the paper needs a lot of rewriting (and probably more space, I would suggest to the authors to submit to a journal). At times it is quite difficult to follow and a lot of the metrics that are presented in Table 1 are not covered at all in the main text. Also, it is not feasible to think that one will evaluate their models according to all the metrics shown in Table 1. A significant contribution would be to identify different subsets of these metrics and show how to use them together to
The topic of data synthesis is highly relevant for many real-world applications where data is very costly to obtain or privacy is a major concern. The presentation is well-structured and detailed. The authors have taken a systematic approach to evaluate different synthesizers in comparative way. They have considered different metrics and provided clear explanations for their choices.
Although well-structured, the presentation is quite dense, and it might be challenging for someone without a background in the area to understand the differences and significance of the analysis framework and findings. The paper has a strong focus on the technical evaluation of synthesizers, but it doesn't discuss the practical implications of the findings. I.e., how might these results impact real-world applications of these synthesizers? It would be useful to know how the methods would have pe
I find the authors incorporation of Probabilistic Cross-categorization— a domain-general method designed for characterizing the full joint distribution of variables in high-dimensional datasets— particularly intriguing, especially in the realm of tabular data generation. This is my first encounter with this approach in a benchmarking context, and its novelty in the author's work is commendable.
I commend the authors efforts in detailing various metrics and particularly the authors exploration into the nuances between model-free metrics and the PCC-based metric. A deeper elaboration on this distinction would be immensely helpful for readers to fully grasp the nuances. The representation in Figure 1B, specifically regarding the spectrums, may benefit from further context or an enriched explanation. This raises a query: Are the authors implying that model-free evaluations, such as those
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies
MethodsSparse Evolutionary Training
