Learning Object Permanence from Videos via Latent Imaginations

Manuel Traub; Frederic Becker; Sebastian Otte; Martin V. Butz

arXiv:2310.10372·cs.CV·April 12, 2024·1 cites

Learning Object Permanence from Videos via Latent Imaginations

Manuel Traub, Frederic Becker, Sebastian Otte, Martin V. Butz

PDF

Open Access

TL;DR

This paper presents Loci-Looped, a self-supervised deep learning model that learns object permanence from videos by fusing latent imaginations with observations, enabling it to track objects through occlusions and predict their reappearance.

Contribution

The introduction of Loci-Looped, a novel self-supervised, interpretable model that learns physical object concepts directly from video data, outperforming existing methods in occlusion handling.

Findings

01

Loci-Looped effectively tracks objects through occlusions.

02

It anticipates object reappearance and detects implausible behaviors.

03

Outperforms state-of-the-art models in occlusion scenarios.

Abstract

While human infants exhibit knowledge about object permanence from two months of age onwards, deep-learning approaches still largely fail to recognize objects' continued existence. We introduce a slot-based autoregressive deep learning system, the looped location and identity tracking model Loci-Looped, which learns to adaptively fuse latent imaginations with pixel-space observations into consistent latent object-specific what and where encodings over time. The novel loop empowers Loci-Looped to learn the physical concepts of object permanence, directional inertia, and object solidity through observation alone. As a result, Loci-Looped tracks objects through occlusions, anticipates their reappearance, and shows signs of surprise and internal revisions when observing implausible object behavior. Notably, Loci-Looped outperforms state-of-the-art baseline models in handling object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition