Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis
Patrick Esser, Johannes Haux, Bj\"orn Ommer

TL;DR
This paper introduces a novel unsupervised method for disentangling appearance and pose in images, enabling high-quality image synthesis and retargeting without pose annotations, outperforming existing methods.
Contribution
The approach learns disentangled representations of appearance and pose using only image pairs, with a classifier to determine minimal regularization for independence, surpassing prior adversarial and variational methods.
Findings
Successfully recombines pose and appearance for novel image synthesis.
Achieves significant improvements over state-of-the-art unsupervised methods.
Comparable performance to pose-supervised approaches without requiring pose annotations.
Abstract
Deep generative models come with the promise to learn an explainable representation for visual objects that allows image sampling, synthesis, and selective modification. The main challenge is to learn to properly model the independent latent characteristics of an object, especially its appearance and pose. We present a novel approach that learns disentangled representations of these characteristics and explains them individually. Training requires only pairs of images depicting the same object appearance, but no pose annotations. We propose an additional classifier that estimates the minimal amount of regularization required to enforce disentanglement. Thus both representations together can completely explain an image while being independent of each other. Previous methods based on adversarial approaches fail to enforce this independence, while methods based on variational approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
