StylePrompter: All Styles Need Is Attention

Chenyi Zhuang; Pan Gao; Aljosa Smolic

arXiv:2307.16151·cs.CV·August 1, 2023

StylePrompter: All Styles Need Is Attention

Chenyi Zhuang, Pan Gao, Aljosa Smolic

PDF

Open Access 1 Repo

TL;DR

StylePrompter introduces a Transformer-based approach for GAN inversion that enhances image reconstruction and editing flexibility by leveraging hierarchical vision Transformers and style-driven refinement.

Contribution

It pioneers the use of a hierarchical vision Transformer backbone and a Style-driven Multi-scale Adaptive Refinement Transformer for improved GAN inversion and editing.

Findings

01

Achieves high-quality image inversion with balanced reconstruction and editability.

02

Outperforms existing methods involving $ ext{F}$ space in inversion tasks.

03

Demonstrates adaptability to various image editing tasks.

Abstract

GAN inversion aims at inverting given images into corresponding latent codes for Generative Adversarial Networks (GANs), especially StyleGAN where exists a disentangled latent space that allows attribute-based image manipulation at latent level. As most inversion methods build upon Convolutional Neural Networks (CNNs), we transfer a hierarchical vision Transformer backbone innovatively to predict $W^{+}$ latent codes at token level. We further apply a Style-driven Multi-scale Adaptive Refinement Transformer (SMART) in $F$ space to refine the intermediate style features of the generator. By treating style features as queries to retrieve lost identity information from the encoder's feature maps, SMART can not only produce high-quality inverted images but also surprisingly adapt to editing tasks. We then prove that StylePrompter lies in a more disentangled $W^{+}$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

i2-multimedia-lab/styleprompter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Advanced Neural Network Applications

MethodsHuMan(Expedia)||How do I get a human at Expedia? · Multi-Head Attention · Attention Is All You Need · Softmax · Position-Wise Feed-Forward Layer · Adaptive Instance Normalization · R1 Regularization · Linear Layer · Label Smoothing · Dropout