Catch Missing Details: Image Reconstruction with Frequency Augmented   Variational Autoencoder

Xinmiao Lin; Yikang Li; Jenhao Hsiao; Chiuman Ho; Yu Kong

arXiv:2305.02541·cs.CV·November 7, 2023·1 cites

Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder

Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong

PDF

Open Access 1 Repo

TL;DR

This paper introduces FA-VAE, a frequency-augmented variational autoencoder that enhances image reconstruction quality at high compression rates by capturing missing high-frequency details, and extends it to text-to-image synthesis with a new transformer model.

Contribution

The paper proposes the Frequency Complement Module and Dynamic Spectrum Loss to improve VQ-VAE image reconstruction, and introduces the Cross-attention Autoregressive Transformer for better text-to-image synthesis.

Findings

01

FA-VAE outperforms state-of-the-art methods in image reconstruction fidelity.

02

The proposed models improve semantic alignment in text-to-image synthesis.

03

Experiments show robustness across various compression rates and datasets.

Abstract

The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises. One major reason is that a higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space. In this paper, a Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality. The FCM can be easily incorporated into the VQ-VAE structure, and we refer to the new model as Frequency Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL) is introduced to guide the FCMs to balance between various frequencies dynamically for optimal reconstruction. FA-VAE is further extended to the text-to-image synthesis task, and a Cross-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oppo-us-research/FA-VAE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · AI in cancer detection

MethodsAttention Is All You Need · Adam · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Absolute Position Encodings