SEDGE: Structural Extrapolated Data Generation
Kun Zhang, Jiaqi Sun, Yiqing Li, Ignavier Ng, Namrata Deka, Shaoan Xie

TL;DR
SEDGE introduces a framework for generating data beyond training samples by leveraging assumptions about data structure, with practical algorithms validated on synthetic and image data.
Contribution
It provides theoretical conditions for reliable data extrapolation and develops algorithms based on structure-informed optimization and diffusion sampling.
Findings
Successful synthetic data extrapolation demonstrated
Effective image data extrapolation in real-world scenarios
Theoretical insights into data distribution identifiability
Abstract
This paper aims to address the challenge of data generation beyond the training data and proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data-generating process. We provide conditions under which data satisfying novel specifications can be generated reliably, together with the approximate identifiability of the distribution of such data under certain ``conservative" assumptions, as well as the inherent non-identifiability of this distribution without such assumptions. On the algorithmic side, we develop practical methods to achieve extrapolated data generation, based on a structure-informed optimization strategy or diffusion posterior sampling, respectively. We verify the extrapolation performance on synthetic data and also consider extrapolated image generation as a real-world scenario to illustrate the validity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
