Repurposing Stable Diffusion Attention for Training-Free Unsupervised   Interactive Segmentation

Markus Karmann; Onay Urfalioglu

arXiv:2411.10411·cs.CV·March 21, 2025

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation

Markus Karmann, Onay Urfalioglu

PDF

Open Access

TL;DR

This paper introduces a novel, training-free unsupervised image segmentation method that leverages the self-attention mechanism of Stable Diffusion to produce high-quality, interactive segmentation results with fewer manual clicks.

Contribution

The authors propose a new approach that interprets Stable Diffusion's self-attention as a Markov transition operator, enabling unsupervised segmentation without training.

Findings

01

Outperforms many training-based unsupervised methods in click efficiency

02

Produces sharper semantic boundaries and less noisy segmentation maps

03

Effective in interactive point prompt segmentation tasks

Abstract

Recent progress in interactive point prompt based Image Segmentation allows to significantly reduce the manual effort to obtain high quality semantic labels. State-of-the-art unsupervised methods use self-supervised pre-trained models to obtain pseudo-labels which are used in training a prompt-based segmentation model. In this paper, we propose a novel unsupervised and training-free approach based solely on the self-attention of Stable Diffusion. We interpret the self-attention tensor as a Markov transition operator, which enables us to iteratively construct a Markov chain. Pixel-wise counting of the required number of iterations along the Markov chain to reach a relative probability threshold yields a Markov-iteration-map, which we simply call a Markov-map. Compared to the raw attention maps, we show that our proposed Markov-map has less noise, sharper semantic boundaries and more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Diffusion