One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Yuan Gao; Chen Chen; Tianrong Chen; Jiatao Gu

arXiv:2512.07829·cs.CV·December 17, 2025

One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Yuan Gao, Chen Chen, Tianrong Chen, Jiatao Gu

PDF

Open Access

TL;DR

This paper introduces FAE, a simple framework that adapts pre-trained visual representations into low-dimensional latents for image generation, achieving high quality with minimal attention layers.

Contribution

FAE is a novel, minimalistic approach that effectively adapts pre-trained features for generative models using just one attention layer, compatible with various encoders and generative methods.

Findings

01

Achieves near state-of-the-art FID on ImageNet 256x256.

02

Performs well with as few as one attention layer.

03

Demonstrates fast learning and high-quality image generation.

Abstract

Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interest in leveraging high-quality pre-trained visual representations, either by aligning them inside VAEs or directly within the generative model. However, adapting such representations remains challenging due to fundamental mismatches between understanding-oriented features and generation-friendly latent spaces. Representation encoders benefit from high-dimensional latents that capture diverse hypotheses for masked regions, whereas generative models favor low-dimensional latents that must faithfully preserve injected noise. This discrepancy has led prior work to rely on complex objectives and architectures. In this work, we propose FAE (Feature Auto-Encoder), a simple yet effective framework that adapts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis