Attention Normalization Impacts Cardinality Generalization in Slot Attention
Markus Krimmel, Jan Achterhold, Joerg Stueckler

TL;DR
This paper shows that normalization choices in Slot Attention significantly affect its ability to generalize to more objects, and proposes simple modifications that improve segmentation performance.
Contribution
It introduces alternative normalization schemes in Slot Attention that enhance its generalization to varying object counts in unsupervised image segmentation.
Findings
Normalization impacts Slot Attention's generalization.
Proposed scaled weighted sum improves performance.
Simple modifications lead to better object count handling.
Abstract
Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compressed information on objects, attend to localized perceptual features from the input image. In this paper, we demonstrate that design decisions on normalizing the aggregated values in the attention architecture have considerable impact on the capabilities of Slot Attention to generalize to a higher number of slots and objects as seen during training. We propose and investigate alternatives to the original normalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMind wandering and attention · Neural and Behavioral Psychology Studies · Creativity in Education and Neuroscience
MethodsSoftmax · Attention Is All You Need
