Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

Bruno Sauvalle; Arnaud de La Fortelle

arXiv:2205.13271·cs.CV·September 1, 2022·1 cites

Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

Bruno Sauvalle, Arnaud de La Fortelle

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents an unsupervised architecture for multi-object segmentation that leverages attention mechanisms and a transformer encoder to improve detection and segmentation accuracy in complex scenes.

Contribution

The novel architecture combines attention, transformer, and autoencoder components for unsupervised multi-object segmentation, outperforming previous methods on synthetic benchmarks.

Findings

01

Significantly outperforms state-of-the-art on synthetic benchmarks

02

Uses attention and transformer for occlusion handling

03

Effective background reconstruction with autoencoder

Abstract

We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present in the scene and to associate a feature vector to each object. A transformer encoder handles occlusions and redundant detections, and a convolutional autoencoder is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BrunoSauvalle/AST
pytorchOfficial

Videos

Unsupervised multi-object segmentation using attention and soft-argmax· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization