Geometric Autoencoder for Diffusion Models

Hangyu Liu; Jianyong Wang; Yutao Sun

arXiv:2603.10365·cs.CV·March 13, 2026

Geometric Autoencoder for Diffusion Models

Hangyu Liu, Jianyong Wang, Yutao Sun

PDF

Open Access 2 Models

TL;DR

The paper introduces Geometric Autoencoder (GAE), a novel framework that enhances latent diffusion models by improving semantic alignment, reconstruction fidelity, and latent compactness, leading to state-of-the-art high-resolution image generation.

Contribution

GAE systematically addresses key challenges in latent diffusion models through optimized semantic supervision and latent normalization, surpassing existing methods in quality and stability.

Findings

01

Achieves a gFID of 1.82 at 80 epochs on ImageNet-1K 256x256

02

Reaches a gFID of 1.31 at 800 epochs without Classifier-Free Guidance

03

Outperforms previous state-of-the-art methods in high-resolution image generation

Abstract

Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These approaches often struggle to unify semantic discriminability, reconstruction fidelity, and latent compactness. In this paper, we propose Geometric Autoencoder (GAE), a principled framework that systematically addresses these challenges. By analyzing various alignment paradigms, GAE constructs an optimized low-dimensional semantic supervision target from VFMs to provide guidance for the autoencoder. Furthermore, we leverage latent normalization that replaces the restrictive KL-divergence of standard VAEs, enabling a more stable latent manifold specifically optimized for diffusion learning. To ensure robust reconstruction under high-intensity noise, GAE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis