Exploring Representation-Aligned Latent Space for Better Generation

Wanghan Xu; Xiaoyu Yue; Zidong Wang; Yao Teng; Wenlong Zhang; Xihui; Liu; Luping Zhou; Wanli Ouyang; Lei Bai

arXiv:2502.00359·cs.LG·February 4, 2025

Exploring Representation-Aligned Latent Space for Better Generation

Wanghan Xu, Xiaoyu Yue, Zidong Wang, Yao Teng, Wenlong Zhang, Xihui, Liu, Luping Zhou, Wanli Ouyang, Lei Bai

PDF

Open Access

TL;DR

This paper introduces ReaLS, a semantic prior integration method for latent diffusion models, significantly improving generation quality and enabling better downstream task performance.

Contribution

ReaLS is a novel approach that aligns latent space with semantic priors, enhancing generation quality and downstream task capabilities in diffusion models.

Findings

01

15% improvement in FID metric with ReaLS

02

Enhanced performance in segmentation and depth estimation tasks

03

ReaLS improves the quality of latent representations in diffusion models

Abstract

Generative models serve as powerful tools for modeling the real world, with mainstream diffusion models, particularly those based on the latent diffusion model paradigm, achieving remarkable progress across various tasks, such as image and video synthesis. Latent diffusion models are typically trained using Variational Autoencoders (VAEs), interacting with VAE latents rather than the real samples. While this generative paradigm speeds up training and inference, the quality of the generated outputs is limited by the latents' quality. Traditional VAE latents are often seen as spatial compression in pixel space and lack explicit semantic representations, which are essential for modeling the real world. In this paper, we introduce ReaLS (Representation-Aligned Latent Space), which integrates semantic priors to improve generation performance. Extensive experiments show that fundamental DiT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques