Convergence Analysis of Flow Matching in Latent Space with Transformers

Yuling Jiao; Yanming Lai; Yang Wang; Bokai Yan

arXiv:2404.02538·stat.ML·April 30, 2024·1 cites

Convergence Analysis of Flow Matching in Latent Space with Transformers

Yuling Jiao, Yanming Lai, Yang Wang, Bokai Yan

PDF

Open Access

TL;DR

This paper provides theoretical convergence guarantees for flow matching in latent space using transformers, demonstrating that generated samples converge to the target distribution under certain conditions.

Contribution

It introduces a convergence analysis for ODE-based generative models with transformers in latent space, including error bounds and approximation capabilities.

Findings

01

Sample distribution converges in Wasserstein-2 distance

02

Transformers can approximate smooth Lipschitz functions effectively

03

The approach is validated under practical assumptions

Abstract

We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analysis demonstrates the effectiveness of this approach, showing that the distribution of samples generated via estimated ODE flow converges to the target distribution in the Wasserstein-2 distance under mild and practical assumptions. Furthermore, we show that arbitrary smooth functions can be effectively approximated by transformer networks with Lipschitz continuity, which may be of independent interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis