Testing GLOM's ability to infer wholes from ambiguous parts

Laura Culp; Sara Sabour; Geoffrey E. Hinton

arXiv:2211.16564·cs.CV·December 1, 2022·1 cites

Testing GLOM's ability to infer wholes from ambiguous parts

Laura Culp, Sara Sabour, Geoffrey E. Hinton

PDF

Open Access

TL;DR

This paper evaluates a simplified GLOM neural network's ability to resolve ambiguity in image parts by forming consistent object representations, demonstrating robustness to noise and transformations.

Contribution

It introduces a simplified GLOM model and shows its effectiveness in resolving ambiguity and maintaining robust object representations.

Findings

01

Successfully forms consistent object embeddings

02

Robust to input noise and transformations

03

Effective in resolving ambiguous parts

Abstract

The GLOM architecture proposed by Hinton [2021] is a recurrent neural network for parsing an image into a hierarchy of wholes and parts. When a part is ambiguous, GLOM assumes that the ambiguity can be resolved by allowing the part to make multi-modal predictions for the pose and identity of the whole to which it belongs and then using attention to similar predictions coming from other possibly ambiguous parts to settle on a common mode that is predicted by several different parts. In this study, we describe a highly simplified version of GLOM that allows us to assess the effectiveness of this way of dealing with ambiguity. Our results show that, with supervised training, GLOM is able to successfully form islands of very similar embedding vectors for all of the locations occupied by the same object and it is also robust to strong noise injections in the input and to out-of-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis