# Learning Input-agnostic Manipulation Directions in StyleGAN with Text   Guidance

**Authors:** Yoonjeon Kim, Hyunsu Kim, Junho Kim, Yunjey Choi, Eunho Yang

arXiv: 2302.13331 · 2023-02-28

## TL;DR

This paper introduces a new method for text-guided, input-agnostic image manipulation using StyleGAN, which considers interactions among multiple channels to improve diversity and accuracy of manipulations.

## Contribution

It proposes a novel dictionary learning approach that accounts for channel interactions, enhancing the versatility of text-guided StyleGAN manipulations.

## Key findings

- Improves discovery of diverse manipulation directions.
- Maintains real-time inference speed.
- Enhances disentanglement in image editing.

## Abstract

With the advantages of fast inference and human-friendly flexible manipulation, image-agnostic style manipulation via text guidance enables new applications that were not previously available. The state-of-the-art text-guided image-agnostic manipulation method embeds the representation of each channel of StyleGAN independently in the Contrastive Language-Image Pre-training (CLIP) space, and provides it in the form of a Dictionary to quickly find out the channel-wise manipulation direction during inference time. However, in this paper we argue that this dictionary which is constructed by controlling single channel individually is limited to accommodate the versatility of text guidance since the collective and interactive relation among multiple channels are not considered. Indeed, we show that it fails to discover a large portion of manipulation directions that can be found by existing methods, which manually manipulates latent space without texts. To alleviate this issue, we propose a novel method that learns a Dictionary, whose entry corresponds to the representation of a single channel, by taking into account the manipulation effect coming from the interaction with multiple other channels. We demonstrate that our strategy resolves the inability of previous methods in finding diverse known directions from unsupervised methods and unknown directions from random text while maintaining the real-time inference speed and disentanglement ability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13331/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13331/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/2302.13331/full.md

---
Source: https://tomesphere.com/paper/2302.13331