SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation
Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci

TL;DR
SAGE introduces a novel LLM-based framework for generating high-fidelity synthetic tabular data by enforcing sparse, dynamic feature dependencies, improving data utility and reducing policy violations.
Contribution
It proposes a new method that models feature dependencies sparsely and adaptively, addressing limitations of previous dense, static dependency approaches.
Findings
Boosts F1 scores by 10% over previous methods
Reduces policy violations by one point
Improves data fidelity and downstream utility
Abstract
Generating high-fidelity synthetic tabular data remains a critical challenge for enhancing data availability in privacy-sensitive and low-resource domains. Recent approaches leverage LLMs by representing table rows as sequences, yet suffer from two fundamental limitations: (1) they model feature dependencies densely, introducing spurious correlations; and (2) they assume static relationships between features, ignoring how these dependencies vary with feature values. To overcome these limitations, we introduce SAGE (Sparse Adaptive Guidance), a novel LLM-based generation framework that enforces sparse and dynamic dependency guidance. SAGE discretizes features into value-aware pseudo-features and constructs a mutual information-based sparse dependency graph. This graph adaptively guides generation through explicit context selection or implicit logit correction, enabling LLMs to focus on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
