Group-based Learning of Disentangled Representations with Generalizability for Novel Contents
Haruo Hosoya

TL;DR
This paper introduces a group-based variational autoencoder that learns disentangled, content-specific representations from unlabeled data, enabling generalization to unseen contents and achieving high separation of content and transformation factors.
Contribution
The novel model learns disentangled representations without explicit labels, allowing for content generalization and improved separation of content and transformation factors.
Findings
Successfully learned content representations that are separate from transformations.
Achieved generalization to unseen contents across five datasets.
Provided insights into the model's invariance and disentanglement mechanisms.
Abstract
Sensory data are often comprised of independent content and transformation factors. For example, face images may have shapes as content and poses as transformation. To infer separately these factors from given data, various ``disentangling'' models have been proposed. However, many of these are supervised or semi-supervised, either requiring attribute labels that are often unavailable or disallowing for generalization over new contents. In this study, we introduce a novel deep generative model, called group-based variational autoencoders. In this, we assume no explicit labels, but a weaker form of structure that groups together data instances having the same content but transformed differently; we thereby separately estimate a group-common factor as content and an instance-specific factor as transformation. This approach allows for learning to represent a general continuous space of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Music and Audio Processing
