Constellation: Learning relational abstractions over objects for compositional imagination
James C.R. Whittington, Rishabh Kabra, Loic Matthey, Christopher P., Burgess, Alexander Lerchner

TL;DR
Constellation is a neural network model that learns relational abstractions of visual scenes, enabling generalization and imaginative reasoning about object configurations, advancing the explicit representation of visual relationships for complex cognition.
Contribution
The paper introduces Constellation, a novel model that learns and generalizes relational abstractions of visual scenes, facilitating reasoning and imagination over object relationships.
Findings
Learned relational abstractions generalize across sensory variations.
Enables imaginative reasoning by combining relational and language cues.
First explicit model for visual relationship representation in scene understanding.
Abstract
Learning structured representations of visual scenes is currently a major bottleneck to bridging perception with reasoning. While there has been exciting progress with slot-based models, which learn to segment scenes into sets of objects, learning configurational properties of entire groups of objects is still under-explored. To address this problem, we introduce Constellation, a network that learns relational abstractions of static visual scenes, and generalises these abstractions over sensory particularities, thus offering a potential basis for abstract relational reasoning. We further show that this basis, along with language association, provides a means to imagine sensory content in new ways. This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
