Generative Distribution Embeddings: Lifting autoencoders to the space of distributions for multiscale representation learning

Nic Fishman; Gokul Gowri; Peng Yin; Jonathan Gootenberg; Omar Abudayyeh

arXiv:2505.18150·cs.LG·February 23, 2026

Generative Distribution Embeddings: Lifting autoencoders to the space of distributions for multiscale representation learning

Nic Fishman, Gokul Gowri, Peng Yin, Jonathan Gootenberg, Omar Abudayyeh

PDF

1 Repo

TL;DR

The paper introduces generative distribution embeddings (GDE), a framework that extends autoencoders to operate on distributions, enabling multiscale representation learning and applications in computational biology.

Contribution

GDE lifts autoencoders to the space of distributions, coupling generative models with encoders to learn distributional representations satisfying distributional invariance.

Findings

01

GDEs learn predictive sufficient statistics in Wasserstein space.

02

Latent GDE distances approximate the Wasserstein-2 distance.

03

GDEs outperform existing methods on synthetic benchmarks.

Abstract

Many real-world problems require reasoning across multiple scales, demanding models which operate not on single data points, but on entire distributions. We introduce generative distribution embeddings (GDE), a framework that lifts autoencoders to the space of distributions. In GDEs, an encoder acts on sets of samples, and the decoder is replaced by a generator which aims to match the input distribution. This framework enables learning representations of distributions by coupling conditional generative models with encoder networks which satisfy a criterion we call distributional invariance. We show that GDEs learn predictive sufficient statistics embedded in the Wasserstein space, such that latent GDE distances approximately recover the $W_{2}$ distance, and latent interpolation approximately recovers optimal transport trajectories for Gaussian and Gaussian mixture distributions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

njwfish/distributionembeddings
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.