DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

Tianhang Wang; Yitong Chen; Wei Song; Zuxuan Wu; Min Li; Jiaqi Wang

arXiv:2605.22777·cs.CV·May 22, 2026

DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

Tianhang Wang, Yitong Chen, Wei Song, Zuxuan Wu, Min Li, Jiaqi Wang

PDF

1 Repo

TL;DR

DecQ introduces detail-condensing queries to enhance reconstruction and generation in Representation Autoencoders by effectively balancing the trade-off through intermediate feature aggregation.

Contribution

It proposes a lightweight framework with detail-condensing queries that improve both reconstruction quality and generative performance in RAEs.

Findings

01

Improves PSNR from 19.13 dB to 22.76 dB with minimal extra computation.

02

Achieves 3.3× faster convergence in generative modeling.

03

Attains an FID of 1.41 without guidance and 1.05 with guidance.

Abstract

Representation Autoencoders (RAEs) leverage frozen vision foundation models (VFMs) as tokenizer encoders, providing robust high-level representations that facilitate fast convergence and high-quality generation in latent diffusion models. However, freezing the VFM inherently constrains its spatial reconstruction capacity, limiting fine-grained generation and image editing; in contrast, incorporating reconstruction-oriented signals via fine-tuning disrupts the pretrained semantic space and degrades generative fidelity. To address this trade-off, we propose DecQ, a simple yet effective framework for RAEs. Specifically, DecQ introduces lightweight detail-condensing queries that extract fine-grained information from intermediate VFM features through condenser modules. These queries are incorporated into the decoder to support reconstruction and are jointly generated with patch tokens during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tianhang-wang/DecQ
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.