An Image is Worth More Than a Thousand Words: Towards Disentanglement in   the Wild

Aviv Gabbay; Niv Cohen; Yedid Hoshen

arXiv:2106.15610·cs.CV·October 26, 2021·5 cites

An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild

Aviv Gabbay, Niv Cohen, Yedid Hoshen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method for disentangling factors of variation in images using limited supervision and off-the-shelf image descriptors, enabling effective manipulation of real-world images with minimal manual annotation.

Contribution

It proposes a novel approach that combines partial labeling and residual factor separation, leveraging CLIP embeddings for zero-shot attribute annotation in real images.

Findings

01

Achieves state-of-the-art disentangled image manipulation results.

02

Demonstrates effectiveness on synthetic benchmarks and real face images.

03

Uses minimal manual effort for attribute annotation.

Abstract

Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data. As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability. While annotating the true generative factors is only required for a limited number of observations, we argue that it is infeasible to enumerate all the factors of variation that describe a real-world image distribution. To this end, we propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors that are never explicitly specified. Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avivga/zerodim
pytorchOfficial

Videos

An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild· slideslive

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques