SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
Gamaleldin F. Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste,, Klaus Greff, Michael C. Mozer, Thomas Kipf

TL;DR
SAVi++ is an advanced object-centric model that learns to segment and track objects in complex real-world videos using depth signals, without explicit segmentation supervision, and can leverage LiDAR data for improved performance.
Contribution
The paper introduces SAVi++, a novel end-to-end model that uses depth signals to enable scalable object segmentation and tracking in real-world videos without supervision.
Findings
Successfully segments complex scenes with moving cameras and diverse objects.
Learns from naturalistic backgrounds without explicit segmentation labels.
Utilizes LiDAR data to enhance real-world object tracking.
Abstract
The visual world can be parsimoniously characterized in terms of distinct entities with sparse interactions. Discovering this compositional structure in dynamic visual scenes has proven challenging for end-to-end computer vision approaches unless explicit instance-level supervision is provided. Slot-based models leveraging motion cues have recently shown great promise in learning to represent, segment, and track objects without direct supervision, but they still fail to scale to complex real-world multi-object videos. In an effort to bridge this gap, we take inspiration from human development and hypothesize that information about scene geometry in the form of depth signals can facilitate object-centric learning. We introduce SAVi++, an object-centric video model which is trained to predict depth signals from a slot-based video representation. By further leveraging best practices for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
