A Tilted Seesaw: Revisiting Autoencoder Trade-off for Controllable Diffusion

Pu Cao; Yiyang Ma; Feng Zhou; Xuedan Yin; Qing Song; Lu Yang

arXiv:2601.21633·cs.CV·January 30, 2026

A Tilted Seesaw: Revisiting Autoencoder Trade-off for Controllable Diffusion

Pu Cao, Yiyang Ma, Feng Zhou, Xuedan Yin, Qing Song, Lu Yang

PDF

Open Access

TL;DR

This paper critically examines the autoencoder evaluation metrics in latent diffusion models, revealing that reconstruction fidelity better predicts controllability than generative metrics like gFID, especially when scaling to controllable diffusion tasks.

Contribution

It provides a theoretical and empirical analysis showing the limitations of gFID-focused evaluation and proposes a more reliable multi-dimensional assessment for controllability in diffusion models.

Findings

01

gFID is weakly predictive of condition preservation

02

Reconstruction metrics better indicate controllability

03

Autoencoder evaluation bias affects controllable diffusion performance

Abstract

In latent diffusion models, the autoencoder (AE) is typically expected to balance two capabilities: faithful reconstruction and a generation-friendly latent space (e.g., low gFID). In recent ImageNet-scale AE studies, we observe a systematic bias toward generative metrics in handling this trade-off: reconstruction metrics are increasingly under-reported, and ablation-based AE selection often favors the best-gFID configuration even when reconstruction fidelity degrades. We theoretically analyze why this gFID-dominant preference can appear unproblematic for ImageNet generation, yet becomes risky when scaling to controllable diffusion: AEs can induce condition drift, which limits achievable condition alignment. Meanwhile, we find that reconstruction fidelity, especially instance-level measures, better indicates controllability. We empirically validate the impact of tilted autoencoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning