SwinStyleformer is a favorable choice for image inversion

Jiawei Mao; Guangyi Zhao; Xuesong Yin; Yuanqi Chang

arXiv:2406.13153·cs.CV·June 21, 2024

SwinStyleformer is a favorable choice for image inversion

Jiawei Mao, Guangyi Zhao, Xuesong Yin, Yuanqi Chang

PDF

Open Access

TL;DR

SwinStyleformer introduces a pure Transformer-based image inversion network that effectively captures local details and global structure, outperforming CNN-based methods by addressing their limitations.

Contribution

The paper presents SwinStyleformer, a novel Transformer-based inversion network with multi-scale connections and learnable query blocks, achieving state-of-the-art results in image inversion.

Findings

01

Successfully addresses Transformer inversion failure.

02

Achieves state-of-the-art performance in image inversion.

03

Enhances local detail and global structure understanding.

Abstract

This paper proposes the first pure Transformer structure inversion network called SwinStyleformer, which can compensate for the shortcomings of the CNNs inversion framework by handling long-range dependencies and learning the global structure of objects. Experiments found that the inversion network with the Transformer backbone could not successfully invert the image. The above phenomena arise from the differences between CNNs and Transformers, such as the self-attention weights favoring image structure ignoring image details compared to convolution, the lack of multi-scale properties of Transformer, and the distribution differences between the latent code extracted by the Transformer and the StyleGAN style vector. To address these differences, we employ the Swin Transformer with a smaller window size as the backbone of the SwinStyleformer to enhance the local detail of the inversion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUltrasound Imaging and Elastography · Infrared Thermography in Medicine

MethodsLinear Layer · Stochastic Depth · Multi-Head Attention · Residual Connection · Convolution · Softmax · Layer Normalization · Focus · Byte Pair Encoding · Label Smoothing