Differentiable Soft-Masked Attention
Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian, Leibe

TL;DR
This paper introduces a differentiable soft-masked attention mechanism for transformers, enabling learning of soft masks within the network, and applies it to weakly-supervised video object segmentation with promising results.
Contribution
We propose a novel differentiable soft-masked attention method that allows mask learning without direct supervision, enhancing weakly-supervised video object segmentation.
Findings
Effective segmentation in unlabeled frames due to novel attention formulation
Achieved weakly-supervised VOS with only one annotated image frame
Code available for implementation and further research
Abstract
Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of particular significance is the 'cross-attention' operation, which allows a vector representation (e.g. of an object in an image) to be learned by attending to an arbitrarily sized set of input features. Recently, "Masked Attention" was proposed in which a given object representation only attends to those image pixel features for which the segmentation mask of that object is active. This specialization of attention proved beneficial for various image and video segmentation tasks. In this paper, we propose another specialization of attention which enables attending over `soft-masks' (those with continuous mask probabilities instead of binary values), and is also differentiable through these mask probabilities, thus allowing the mask used for attention to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection
MethodsVOS
