DSRGAN: Explicitly Learning Disentangled Representation of Underlying Structure and Rendering for Image Generation without Tuple Supervision
Guang-Yuan Hao, Hong-Xing Yu, Wei-Shi Zheng

TL;DR
This paper introduces DSRGAN, a novel GAN-based method that learns to disentangle underlying spatial structure from rendering in images without tuple supervision, enabling independent control of these factors.
Contribution
The paper proposes a new framework with a shared latent space and parallel generative networks to achieve disentangled representation learning without tuple supervision.
Findings
DSRGAN outperforms state-of-the-art methods in disentanglability.
Introduces the Normalized Disentanglability metric for quantitative evaluation.
Demonstrates effective independent control of structure and rendering in image generation.
Abstract
We focus on explicitly learning disentangled representation for natural image generation, where the underlying spatial structure and the rendering on the structure can be independently controlled respectively, yet using no tuple supervision. The setting is significant since tuple supervision is costly and sometimes even unavailable. However, the task is highly unconstrained and thus ill-posed. To address this problem, we propose to introduce an auxiliary domain which shares a common underlying-structure space with the target domain, and we make a partially shared latent space assumption. The key idea is to encourage the partially shared latent variable to represent the similar underlying spatial structures in both domains, while the two domain-specific latent variables will be unavoidably arranged to present renderings of two domains respectively. This is achieved by designing two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction · Digital Media Forensic Detection
