Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela C\u{a}t\u{a}lina Stoian, Eleonora Giunchiglia

TL;DR
This paper introduces the Disjunctive Refinement Layer (DRL), a novel method that ensures deep generative models produce realistic, constraint-compliant tabular data, significantly improving data quality and downstream task performance.
Contribution
The paper presents DRL, the first layer enabling deep models to automatically satisfy complex linear constraints, including non-convex and disconnected spaces, in tabular data generation.
Findings
DRL guarantees constraint satisfaction in generated data.
DRL eliminates violations of user-defined constraints.
Improves downstream task metrics by up to 21.4% F1-score and 20.9% AUC.
Abstract
Synthetic tabular data generation has traditionally been a challenging problem due to the high complexity of the underlying distributions that characterise this type of data. Despite recent advances in deep generative models (DGMs), existing methods often fail to produce realistic datapoints that are well-aligned with available background knowledge. In this paper, we address this limitation by introducing Disjunctive Refinement Layer (DRL), a novel layer designed to enforce the alignment of generated data with the background knowledge specified in user-defined constraints. DRL is the first method able to automatically make deep learning models inherently compliant with constraints as expressive as quantifier-free linear formulas, which can define non-convex and even disconnected spaces. Our experimental analysis shows that DRL not only guarantees constraint satisfaction but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Fault Detection and Control Systems
