Pixel-Space Post-Training of Latent Diffusion Models
Christina Zhang, Simran Motwani, Matthew Yu, Ji Hou, Felix Juefei-Xu,, Sam Tsai, Peter Vajda, Zijian He, Jialiang Wang

TL;DR
This paper introduces a pixel-space supervision method for post-training latent diffusion models, significantly enhancing high-frequency detail preservation and visual quality without sacrificing text alignment accuracy.
Contribution
It proposes a novel pixel-space post-training approach for LDMs, addressing high-frequency detail issues and improving visual quality in image generation.
Findings
Pixel-space supervision improves high-frequency detail preservation.
Enhanced visual quality and flaw metrics in LDMs.
Maintains text alignment quality after post-training.
Abstract
Latent diffusion models (LDMs) have made significant advancements in the field of image generation in recent years. One major advantage of LDMs is their ability to operate in a compressed latent space, allowing for more efficient training and deployment. However, despite these advantages, challenges with LDMs still remain. For example, it has been observed that LDMs often generate high-frequency details and complex compositions imperfectly. We hypothesize that one reason for these flaws is due to the fact that all pre- and post-training of LDMs are done in latent space, which is typically lower spatial-resolution than the output images. To address this issue, we propose adding pixel-space supervision in the post-training process to better preserve high-frequency details. Experimentally, we show that adding a pixel-space objective significantly improves both supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Concatenated Skip Connection · Diffusion · Max Pooling · U-Net
