Making Training-Free Diffusion Segmentors Scale with the Generative Power

Benyuan Meng; Qianqian Xu; Zitai Wang; Xiaochun Cao; Longtao Huang; Qingming Huang

arXiv:2603.06178·cs.CV·March 30, 2026

Making Training-Free Diffusion Segmentors Scale with the Generative Power

Benyuan Meng, Qianqian Xu, Zitai Wang, Xiaochun Cao, Longtao Huang, Qingming Huang

PDF

1 Repo

TL;DR

This paper enhances training-free diffusion-based semantic segmentation by addressing attention map discrepancies, enabling better utilization of powerful generative models for improved segmentation accuracy.

Contribution

It identifies key gaps in existing methods and proposes auto aggregation and per-pixel rescaling to improve segmentation performance without additional training.

Findings

01

Improved segmentation accuracy on standard benchmarks.

02

Effective integration with generative techniques for broader applicability.

03

Addresses attention map discrepancies in diffusion models.

Abstract

As powerful generative models, text-to-image diffusion models have recently been explored for discriminative tasks. A line of research focuses on adapting a pre-trained diffusion model to semantic segmentation without any further training, leading to training-free diffusion segmentors. These methods typically rely on cross-attention maps from the model's attention layers, which are assumed to capture semantic relationships between image pixels and text tokens. Ideally, such approaches should benefit from more powerful diffusion models, i.e., stronger generative capability should lead to better segmentation. However, we observe that existing methods often fail to scale accordingly. To understand this issue, we identify two underlying gaps: (i) cross-attention is computed across multiple heads and layers, but there exists a discrepancy between these individual attention maps and a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Darkbblue/goca
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.