Generative Learning of Differentiable Object Models for Compositional Interpretation of Complex Scenes

Antoni Nowinowski; Krzysztof Krawiec

arXiv:2506.08191·cs.CV·June 11, 2025

Generative Learning of Differentiable Object Models for Compositional Interpretation of Complex Scenes

Antoni Nowinowski, Krzysztof Krawiec

PDF

Open Access

TL;DR

This paper extends a scene interpretation autoencoder to handle multiple objects, improving decomposition and reconstruction quality through novel training modes and a new benchmark, advancing compositional scene understanding.

Contribution

The study introduces an extended DVP model capable of multi-object scene interpretation, with new training strategies and a more complex benchmark for evaluation.

Findings

01

Outperforms baselines in reconstruction quality

02

Better decomposition of overlapping objects

03

Enhanced training stability and efficiency

Abstract

This study builds on the architecture of the Disentangler of Visual Priors (DVP), a type of autoencoder that learns to interpret scenes by decomposing the perceived objects into independent visual aspects of shape, size, orientation, and color appearance. These aspects are expressed as latent parameters which control a differentiable renderer that performs image reconstruction, so that the model can be trained end-to-end with gradient using reconstruction loss. In this study, we extend the original DVP so that it can handle multiple objects in a scene. We also exploit the interpretability of its latent by using the decoder to sample additional training examples and devising alternative training modes that rely on loss functions defined not only in the image space, but also in the latent space. This significantly facilitates training, which is otherwise challenging due to the presence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis