Smoothing Slot Attention Iterations and Recurrences

Rongzhen Zhao; Wenyan Yang; Juho Kannala; Joni Pajarinen

arXiv:2508.05417·cs.CV·May 4, 2026

Smoothing Slot Attention Iterations and Recurrences

Rongzhen Zhao, Wenyan Yang, Juho Kannala, Joni Pajarinen

PDF

1 Repo

TL;DR

SmoothSA enhances Slot Attention in object-centric learning by smoothing iterations and recurrences, improving object representation aggregation in images and videos through feature-informed query initialization and differentiated transforms.

Contribution

It introduces a novel smoothing approach with feature preheating and differentiated transforms to improve Slot Attention's performance on images and videos.

Findings

01

Improved object discovery, recognition, and visual reasoning results.

02

Effective smoothing of Slot Attention iterations and recurrences.

03

Validated through comprehensive experiments and visual analyses.

Abstract

Slot Attention (SA) lies at the heart of mainstream Object-Centric Learning (OCL). Image features can be aggregated into object-level representations by SA \textit{iteratively} refining cold-start query slots. For video, such aggregation proceeds by SA \textit{recurrently} shared across frames, with queries cold-started on the first frame while transitioned from the previous frame's slots thereafter. However, cold-start queries lack sample-specific cues thus hindering precise aggregation on image or video's first frame; Non-first frames' queries are already sample-specific thus requiring aggregation transforms different from the first frame. We address these issues with our \textit{SmoothSA}: (1) To smooth SA iterations on image or video's first frame, we \textit{preheat} cold-start queries with rich input-feature information, by a tiny module self-distilled inside OCL; (2) To smooth SA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Genera1Z/SmoothSA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.