SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

Qinyu Zhao; Guangting Zheng; Tao Yang; Rui Zhu; Xingjian Leng; Stephen Gould; Liang Zheng

arXiv:2512.04084·cs.CV·December 4, 2025

SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

Qinyu Zhao, Guangting Zheng, Tao Yang, Rui Zhu, Xingjian Leng, Stephen Gould, Liang Zheng

PDF

Open Access

TL;DR

SimFlow introduces a simple approach fixing VAE variance to enable end-to-end training of latent normalizing flows, resulting in improved image generation quality and state-of-the-art performance on ImageNet 256x256.

Contribution

The paper proposes fixing the VAE variance to simplify training and improve the quality of latent normalizing flows, enabling end-to-end training without complex pipelines.

Findings

01

Achieves a gFID of 2.15 on ImageNet 256x256, outperforming STARFlow.

02

Integrating with REPA-E further improves gFID to 1.91.

03

Simplifies training by fixing variance, avoiding additional noise or denoising steps.

Abstract

Normalizing Flows (NFs) learn invertible mappings between the data and a Gaussian distribution. Prior works usually suffer from two limitations. First, they add random noise to training samples or VAE latents as data augmentation, introducing complex pipelines including extra noising and denoising steps. Second, they use a pretrained and frozen VAE encoder, resulting in suboptimal reconstruction and generation quality. In this paper, we find that the two issues can be solved in a very simple way: just fixing the variance (which would otherwise be predicted by the VAE encoder) to a constant (e.g., 0.5). On the one hand, this method allows the encoder to output a broader distribution of tokens and the decoder to learn to reconstruct clean images from the augmented token distribution, avoiding additional noise or denoising design. On the other hand, fixed variance simplifies the VAE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Image and Signal Denoising Methods