Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong

TL;DR
This paper introduces FA-VAE, a frequency-augmented variational autoencoder that enhances image reconstruction quality at high compression rates by capturing missing high-frequency details, and extends it to text-to-image synthesis with a new transformer model.
Contribution
The paper proposes the Frequency Complement Module and Dynamic Spectrum Loss to improve VQ-VAE image reconstruction, and introduces the Cross-attention Autoregressive Transformer for better text-to-image synthesis.
Findings
FA-VAE outperforms state-of-the-art methods in image reconstruction fidelity.
The proposed models improve semantic alignment in text-to-image synthesis.
Experiments show robustness across various compression rates and datasets.
Abstract
The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises. One major reason is that a higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space. In this paper, a Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality. The FCM can be easily incorporated into the VQ-VAE structure, and we refer to the new model as Frequency Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL) is introduced to guide the FCMs to balance between various frequencies dynamically for optimal reconstruction. FA-VAE is further extended to the text-to-image synthesis task, and a Cross-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · AI in cancer detection
MethodsAttention Is All You Need · Adam · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Absolute Position Encodings
