TL;DR
AsymFlow introduces a rank-asymmetric velocity parameterization for flow models, enabling high-quality pixel-space image generation and seamless finetuning from latent models, achieving state-of-the-art results on ImageNet and text-to-image tasks.
Contribution
The paper proposes AsymFlow, a novel asymmetric velocity parameterization that improves flow-based image generation and allows effective finetuning from latent models without architectural changes.
Findings
Achieves 1.57 FID on ImageNet 256x256, outperforming prior models.
Enables finetuning latent flow models into pixel-space models effectively.
Sets new state-of-the-art in pixel-space text-to-image generation.
Abstract
Flow-based generation in high-dimensional spaces is difficult because velocity prediction requires modeling high-dimensional noise, even when data has strong low-rank structure. We present Asymmetric Flow Modeling (AsymFlow), a rank-asymmetric velocity parameterization that restricts noise prediction to a low-rank subspace while keeping data prediction full-dimensional. From this asymmetric prediction, AsymFlow analytically recovers the full-dimensional velocity without changing the network architecture or training/sampling procedures. On ImageNet 256256, AsymFlow achieves a leading 1.57 FID, outperforming prior DiT/JiT-like pixel diffusion models by a large margin. AsymFlow also provides the first-ever route for finetuning pretrained latent flow models into pixel-space models: aligning the low-rank pixel subspace to the latent space gives a seamless initialization that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
