Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

Qiaosi Yi; Shuai Li; Rongyuan Wu; Lingchen Sun; Yuhui Wu; Lei Zhang

arXiv:2507.20291·cs.CV·July 29, 2025

Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training

Qiaosi Yi, Shuai Li, Rongyuan Wu, Lingchen Sun, Yuhui Wu, Lei Zhang

PDF

TL;DR

This paper introduces a Transfer VAE Training strategy to improve real-world image super-resolution by better preserving fine structures, reducing computational costs, and adapting pre-trained models for enhanced detail recovery.

Contribution

The novel TVT method transfers a high downsampling VAE to a lower one, aligning with pre-trained UNet, and optimizes network architectures for efficiency and detail preservation.

Findings

01

Significantly improves fine-structure preservation in super-resolution.

02

Reduces computational cost compared to existing diffusion models.

03

Achieves better detail recovery in real-world images.

Abstract

Impressive results on real-world image super-resolution (Real-ISR) have been achieved by employing pre-trained stable diffusion (SD) models. However, one critical issue of such methods lies in their poor reconstruction of image fine structures, such as small characters and textures, due to the aggressive resolution reduction of the VAE (eg., 8 $\times$ downsampling) in the SD model. One solution is to employ a VAE with a lower downsampling rate for diffusion; however, adapting its latent features with the pre-trained UNet while mitigating the increased computational cost poses new challenges. To address these issues, we propose a Transfer VAE Training (TVT) strategy to transfer the 8 $\times$ downsampled VAE into a 4 $\times$ one while adapting to the pre-trained UNet. Specifically, we first train a 4 $\times$ decoder based on the output features of the original VAE encoder, then train a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.