Learning Object Permanence from Videos via Latent Imaginations
Manuel Traub, Frederic Becker, Sebastian Otte, Martin V. Butz

TL;DR
This paper presents Loci-Looped, a self-supervised deep learning model that learns object permanence from videos by fusing latent imaginations with observations, enabling it to track objects through occlusions and predict their reappearance.
Contribution
The introduction of Loci-Looped, a novel self-supervised, interpretable model that learns physical object concepts directly from video data, outperforming existing methods in occlusion handling.
Findings
Loci-Looped effectively tracks objects through occlusions.
It anticipates object reappearance and detects implausible behaviors.
Outperforms state-of-the-art models in occlusion scenarios.
Abstract
While human infants exhibit knowledge about object permanence from two months of age onwards, deep-learning approaches still largely fail to recognize objects' continued existence. We introduce a slot-based autoregressive deep learning system, the looped location and identity tracking model Loci-Looped, which learns to adaptively fuse latent imaginations with pixel-space observations into consistent latent object-specific what and where encodings over time. The novel loop empowers Loci-Looped to learn the physical concepts of object permanence, directional inertia, and object solidity through observation alone. As a result, Loci-Looped tracks objects through occlusions, anticipates their reappearance, and shows signs of surprise and internal revisions when observing implausible object behavior. Notably, Loci-Looped outperforms state-of-the-art baseline models in handling object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition
