TL;DR
This paper presents a new method to create an open-ended, human-interpretable vocabulary of visual concepts in GAN latent spaces, enabling more precise and meaningful image manipulations.
Contribution
It introduces a three-component approach combining automatic detection, human annotation, and decomposition to build a reliable, composable visual concept vocabulary for GANs.
Findings
Concepts are reliable and generalize across classes and observers.
Enables fine-grained manipulation of image style and content.
Concepts are interpretable and composable.
Abstract
A large body of recent work has identified transformations in the latent spaces of generative adversarial networks (GANs) that consistently and interpretably transform generated images. But existing techniques for identifying these transformations rely on either a fixed vocabulary of pre-specified visual concepts, or on unsupervised disentanglement techniques whose alignment with human judgments about perceptual salience is unknown. This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space. Our approach is built from three components: (1) automatic identification of perceptually salient directions based on their layer selectivity; (2) human annotation of these directions with free-form, compositional natural language descriptions; and (3) decomposition of these annotations into a visual concept vocabulary,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
