Predicting Video Slot Attention Queries from Random Slot-Feature Pairs

Rongzhen Zhao; Jian Li; Juho Kannala; Joni Pajarinen

arXiv:2508.01345·cs.CV·April 20, 2026

Predicting Video Slot Attention Queries from Random Slot-Feature Pairs

Rongzhen Zhao, Jian Li, Juho Kannala, Joni Pajarinen

PDF

1 Repo

TL;DR

This paper introduces RandSF.Q, a novel unsupervised video object-centric learning method that predicts queries from random slot-feature pairs, effectively learning transition dynamics and improving scene understanding.

Contribution

It proposes a new transitioner incorporating slots and features, trained on random slot-feature pairs to learn transition dynamics, achieving state-of-the-art results.

Findings

01

Surpasses existing methods by up to 10 points in object discovery

02

Significantly improves scene representation and downstream scene understanding tasks

03

Effectively learns transition dynamics through random slot-feature pair training

Abstract

Unsupervised video Object-Centric Learning (OCL) is promising as it enables object-level scene representation and understanding as we humans do. Mainstream video OCL methods adopt a recurrent architecture: An aggregator aggregates current video frame into object features, termed slots, under some queries; A transitioner transits current slots to queries for the next frame. This is an effective architecture but all existing implementations both (\textit{i1}) neglect to incorporate next frame features, the most informative source for query prediction, and (\textit{i2}) fail to learn transition dynamics, the knowledge essential for query prediction. To address these issues, we propose Random Slot-Feature pair for learning Query prediction (RandSF.Q): (\textit{t1}) We design a new transitioner to incorporate both slots and features, which provides more information for query prediction;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Genera1Z/RandSF.Q
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.