Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Manuel Traub, Sebastian Otte, Tobias Menge, Matthias Karlbauer, Jannik, Th\"ummel, Martin V. Butz

TL;DR
This paper introduces Loci, a self-supervised system inspired by brain pathways, that disentangles object identity and location in videos, improving object tracking and reasoning without supervision.
Contribution
Loci is a novel self-supervised model that separates 'what' and 'where' information, enhancing object tracking and reasoning in video streams without labeled data.
Findings
Loci outperforms existing benchmarks on CATER tracking challenge.
It effectively extracts objects and separates location and Gestalt components.
The model speeds up learning and improves memory efficiency.
Abstract
Our brain can almost effortlessly decompose visual data streams into background and salient objects. Moreover, it can anticipate object motion and interactions, which are crucial abilities for conceptual planning and reasoning. Recent object reasoning datasets, such as CATER, have revealed fundamental shortcomings of current vision-based AI systems, particularly when targeting explicit object representations, object permanence, and object reasoning. Here we introduce a self-supervised LOCation and Identity tracking system (Loci), which excels on the CATER tracking challenge. Inspired by the dorsal and ventral pathways in the brain, Loci tackles the binding problem by processing separate, slot-wise encodings of `what' and `where'. Loci's predictive coding-like processing encourages active error minimization, such that individual slots tend to encode individual objects. Interactions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
