Object Permanence Emerges in a Random Walk along Memory
Pavel Tokmakov, Allan Jabri, Jie Li, Adrien Gaidon

TL;DR
This paper introduces a self-supervised method for learning object permanence by optimizing temporal coherence in memory representations, enabling models to localize and predict occluded objects without human annotations.
Contribution
It presents a novel self-supervised learning approach that leverages a Markov walk on memory features to capture object permanence without explicit supervision or assumptions about object dynamics.
Findings
Outperforms existing methods on multiple datasets
Requires minimal supervision and no human annotations
Successfully localizes and predicts occluded objects
Abstract
This paper proposes a self-supervised objective for learning representations that localize objects under occlusion - a property known as object permanence. A central question is the choice of learning signal in cases of total occlusion. Rather than directly supervising the locations of invisible objects, we propose a self-supervised objective that requires neither human annotation, nor assumptions about object dynamics. We show that object permanence can emerge by optimizing for temporal coherence of memory: we fit a Markov walk along a space-time graph of memories, where the states in each time step are non-Markovian features from a sequence encoder. This leads to a memory representation that stores occluded objects and predicts their motion, to better localize them. The resulting model outperforms existing approaches on several datasets of increasing complexity and realism, despite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning
