Towards Improving the Generation Quality of Autoregressive Slot VAEs
Patrick Emami, Pan He, Sanjay Ranka, Anand Rangarajan

TL;DR
This paper enhances autoregressive slot VAEs by conditioning on scene-level variables and learning a consistent object order, significantly improving unconditional scene generation quality.
Contribution
It introduces two novel improvements: scene-level conditioning and learned object ordering, to better model object correlations in scene generation.
Findings
Improved unconditional scene generation quality across three environments.
Validated effectiveness of scene-level conditioning and object ordering through ablation studies.
Achieved significant gains over baseline models in generating coherent multi-object scenes.
Abstract
Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (''slots'') from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multi-object relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
MethodsALIGN · Variational Inference
