Improving Reconstruction of Representation Autoencoder

Siyu Liu; Chujie Qin; Hubery Yin; Qixin Yan; Zheng-Peng Duan; Chen Li; Jing Lyu; Chun-Le Guo; Chongyi Li

arXiv:2602.08620·cs.CV·February 10, 2026

Improving Reconstruction of Representation Autoencoder

Siyu Liu, Chujie Qin, Hubery Yin, Qixin Yan, Zheng-Peng Duan, Chen Li, Jing Lyu, Chun-Le Guo, Chongyi Li

PDF

Open Access

TL;DR

This paper introduces LV-RAE, a representation autoencoder that enhances image reconstruction fidelity by combining semantic features with low-level details, and improves generative quality through decoder fine-tuning and noise smoothing.

Contribution

LV-RAE is a novel autoencoder that augments semantic features with low-level information and employs decoder fine-tuning and noise injection for improved reconstruction and generation.

Findings

01

LV-RAE achieves higher reconstruction fidelity.

02

It maintains semantic consistency in generated images.

03

Decoder robustness improvements reduce artifacts.

Abstract

Recent work leverages Vision Foundation Models as image encoders to boost the generative performance of latent diffusion models (LDMs), as their semantic feature distributions are easy to learn. However, such semantic features often lack low-level information (\eg, color and texture), leading to degraded reconstruction fidelity, which has emerged as a primary bottleneck in further scaling LDMs. To address this limitation, we propose LV-RAE, a representation autoencoder that augments semantic features with missing low-level information, enabling high-fidelity reconstruction while remaining highly aligned with the semantic distribution. We further observe that the resulting high-dimensional, information-rich latent make decoders sensitive to latent perturbations, causing severe artifacts when decoding generated latent and consequently degrading generation quality. Our analysis suggests…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning