Unlocking Slot Attention by Changing Optimal Transport Costs

Yan Zhang; David W. Zhang; Simon Lacoste-Julien; Gertjan J. Burghouts,; Cees G. M. Snoek

arXiv:2301.13197·cs.LG·June 1, 2023·1 cites

Unlocking Slot Attention by Changing Optimal Transport Costs

Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts,, Cees G. M. Snoek

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper introduces MESH, a novel cross-attention module that enhances slot attention by integrating optimal transport techniques, enabling better handling of dynamic object counts in videos and improving performance on object-centric benchmarks.

Contribution

It establishes a connection between slot attention and optimal transport, and proposes MESH, a new method that combines unregularized and regularized optimal transport for improved object modeling.

Findings

01

MESH significantly outperforms standard slot attention on multiple benchmarks.

02

The method effectively handles videos with a dynamic number of objects.

03

MESH improves tie-breaking in object-centric modeling.

Abstract

Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davzha/mesh
pytorchOfficial

Videos

Unlocking Slot Attention by Changing Optimal Transport Costs· youtube

Unlocking Slot Attention by Changing Optimal Transport Costs· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection

MethodsConcatenated Skip Connection · Softmax · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings