Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
Markus Karmann, Onay Urfalioglu

TL;DR
This paper introduces a novel, training-free unsupervised image segmentation method that leverages the self-attention mechanism of Stable Diffusion to produce high-quality, interactive segmentation results with fewer manual clicks.
Contribution
The authors propose a new approach that interprets Stable Diffusion's self-attention as a Markov transition operator, enabling unsupervised segmentation without training.
Findings
Outperforms many training-based unsupervised methods in click efficiency
Produces sharper semantic boundaries and less noisy segmentation maps
Effective in interactive point prompt segmentation tasks
Abstract
Recent progress in interactive point prompt based Image Segmentation allows to significantly reduce the manual effort to obtain high quality semantic labels. State-of-the-art unsupervised methods use self-supervised pre-trained models to obtain pseudo-labels which are used in training a prompt-based segmentation model. In this paper, we propose a novel unsupervised and training-free approach based solely on the self-attention of Stable Diffusion. We interpret the self-attention tensor as a Markov transition operator, which enables us to iteratively construct a Markov chain. Pixel-wise counting of the required number of iterations along the Markov chain to reach a relative probability threshold yields a Markov-iteration-map, which we simply call a Markov-map. Compared to the raw attention maps, we show that our proposed Markov-map has less noise, sharper semantic boundaries and more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Multimodal Machine Learning Applications
MethodsSoftmax · Attention Is All You Need · Diffusion
