TL;DR
SmoothSA enhances Slot Attention in object-centric learning by smoothing iterations and recurrences, improving object representation aggregation in images and videos through feature-informed query initialization and differentiated transforms.
Contribution
It introduces a novel smoothing approach with feature preheating and differentiated transforms to improve Slot Attention's performance on images and videos.
Findings
Improved object discovery, recognition, and visual reasoning results.
Effective smoothing of Slot Attention iterations and recurrences.
Validated through comprehensive experiments and visual analyses.
Abstract
Slot Attention (SA) lies at the heart of mainstream Object-Centric Learning (OCL). Image features can be aggregated into object-level representations by SA \textit{iteratively} refining cold-start query slots. For video, such aggregation proceeds by SA \textit{recurrently} shared across frames, with queries cold-started on the first frame while transitioned from the previous frame's slots thereafter. However, cold-start queries lack sample-specific cues thus hindering precise aggregation on image or video's first frame; Non-first frames' queries are already sample-specific thus requiring aggregation transforms different from the first frame. We address these issues with our \textit{SmoothSA}: (1) To smooth SA iterations on image or video's first frame, we \textit{preheat} cold-start queries with rich input-feature information, by a tiny module self-distilled inside OCL; (2) To smooth SA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
