TL;DR
This paper introduces a novel GAN-based model called CoDe that learns to compose realistic images from two objects, capturing their interactions and spatial relationships, which improves scene synthesis quality.
Contribution
The paper proposes a self-consistent Composition-by-Decomposition network that explicitly models interactions between objects in scene composition tasks.
Findings
The model generates realistic composite images from object pairs.
It captures interactions like spatial layout and occlusion.
Qualitative and user evaluations support its effectiveness.
Abstract
Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose a novel self-consistent Composition-by-Decomposition (CoDe) network to compose a pair of objects. Given object images from two distinct distributions, our model can generate a realistic composite image from their joint distribution following the texture and shape of the input objects. We evaluate our approach through qualitative experiments and user evaluations. Our results indicate that the learned model captures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
