One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
Yuan Gao, Chen Chen, Tianrong Chen, Jiatao Gu

TL;DR
This paper introduces FAE, a simple framework that adapts pre-trained visual representations into low-dimensional latents for image generation, achieving high quality with minimal attention layers.
Contribution
FAE is a novel, minimalistic approach that effectively adapts pre-trained features for generative models using just one attention layer, compatible with various encoders and generative methods.
Findings
Achieves near state-of-the-art FID on ImageNet 256x256.
Performs well with as few as one attention layer.
Demonstrates fast learning and high-quality image generation.
Abstract
Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interest in leveraging high-quality pre-trained visual representations, either by aligning them inside VAEs or directly within the generative model. However, adapting such representations remains challenging due to fundamental mismatches between understanding-oriented features and generation-friendly latent spaces. Representation encoders benefit from high-dimensional latents that capture diverse hypotheses for masked regions, whereas generative models favor low-dimensional latents that must faithfully preserve injected noise. This discrepancy has led prior work to rely on complex objectives and architectures. In this work, we propose FAE (Feature Auto-Encoder), a simple yet effective framework that adapts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
