Geometric Autoencoder for Diffusion Models
Hangyu Liu, Jianyong Wang, Yutao Sun

TL;DR
The paper introduces Geometric Autoencoder (GAE), a novel framework that enhances latent diffusion models by improving semantic alignment, reconstruction fidelity, and latent compactness, leading to state-of-the-art high-resolution image generation.
Contribution
GAE systematically addresses key challenges in latent diffusion models through optimized semantic supervision and latent normalization, surpassing existing methods in quality and stability.
Findings
Achieves a gFID of 1.82 at 80 epochs on ImageNet-1K 256x256
Reaches a gFID of 1.31 at 800 epochs without Classifier-Free Guidance
Outperforms previous state-of-the-art methods in high-resolution image generation
Abstract
Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These approaches often struggle to unify semantic discriminability, reconstruction fidelity, and latent compactness. In this paper, we propose Geometric Autoencoder (GAE), a principled framework that systematically addresses these challenges. By analyzing various alignment paradigms, GAE constructs an optimized low-dimensional semantic supervision target from VFMs to provide guidance for the autoencoder. Furthermore, we leverage latent normalization that replaces the restrictive KL-divergence of standard VAEs, enabling a more stable latent manifold specifically optimized for diffusion learning. To ensure robust reconstruction under high-intensity noise, GAE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis
