TL;DR
This paper introduces RandSF.Q, a novel unsupervised video object-centric learning method that predicts queries from random slot-feature pairs, effectively learning transition dynamics and improving scene understanding.
Contribution
It proposes a new transitioner incorporating slots and features, trained on random slot-feature pairs to learn transition dynamics, achieving state-of-the-art results.
Findings
Surpasses existing methods by up to 10 points in object discovery
Significantly improves scene representation and downstream scene understanding tasks
Effectively learns transition dynamics through random slot-feature pair training
Abstract
Unsupervised video Object-Centric Learning (OCL) is promising as it enables object-level scene representation and understanding as we humans do. Mainstream video OCL methods adopt a recurrent architecture: An aggregator aggregates current video frame into object features, termed slots, under some queries; A transitioner transits current slots to queries for the next frame. This is an effective architecture but all existing implementations both (\textit{i1}) neglect to incorporate next frame features, the most informative source for query prediction, and (\textit{i2}) fail to learn transition dynamics, the knowledge essential for query prediction. To address these issues, we propose Random Slot-Feature pair for learning Query prediction (RandSF.Q): (\textit{t1}) We design a new transitioner to incorporate both slots and features, which provides more information for query prediction;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
