SEDGE: Structural Extrapolated Data Generation

Kun Zhang; Jiaqi Sun; Yiqing Li; Ignavier Ng; Namrata Deka; Shaoan Xie

arXiv:2604.02482·cs.LG·May 15, 2026

SEDGE: Structural Extrapolated Data Generation

Kun Zhang, Jiaqi Sun, Yiqing Li, Ignavier Ng, Namrata Deka, Shaoan Xie

PDF

TL;DR

SEDGE introduces a framework for generating data beyond training samples by leveraging assumptions about data structure, with practical algorithms validated on synthetic and image data.

Contribution

It provides theoretical conditions for reliable data extrapolation and develops algorithms based on structure-informed optimization and diffusion sampling.

Findings

01

Successful synthetic data extrapolation demonstrated

02

Effective image data extrapolation in real-world scenarios

03

Theoretical insights into data distribution identifiability

Abstract

This paper aims to address the challenge of data generation beyond the training data and proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data-generating process. We provide conditions under which data satisfying novel specifications can be generated reliably, together with the approximate identifiability of the distribution of such data under certain ``conservative" assumptions, as well as the inherent non-identifiability of this distribution without such assumptions. On the algorithmic side, we develop practical methods to achieve extrapolated data generation, based on a structure-informed optimization strategy or diffusion posterior sampling, respectively. We verify the extrapolation performance on synthetic data and also consider extrapolated image generation as a real-world scenario to illustrate the validity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.