Split-Fuse-Transport: Annotation-Free Saliency via Dual Clustering and Optimal Transport Alignment
Muhammad Umer Ramzan, Ali Zia, Abdelwahed Khamis, Noman Ali, Usman Ali, Wei Xiang

TL;DR
AutoSOD introduces an unsupervised salient object detection method that leverages dual clustering and optimal transport to generate high-quality pseudo-masks, significantly improving accuracy without pixel-level labels.
Contribution
The paper proposes POTNet with dual clustering and optimal transport, enabling near-supervised SOD accuracy in an end-to-end unsupervised pipeline without handcrafted priors.
Findings
Outperforms existing unsupervised methods by up to 26% in F-measure.
Outperforms weakly supervised methods by up to 36% in F-measure.
Narrowing the gap to fully supervised models.
Abstract
Salient object detection (SOD) aims to segment visually prominent regions in images and serves as a foundational task for various computer vision applications. We posit that SOD can now reach near-supervised accuracy without a single pixel-level label, but only when reliable pseudo-masks are available. We revisit the prototype-based line of work and make two key observations. First, boundary pixels and interior pixels obey markedly different geometry; second, the global consistency enforced by optimal transport (OT) is underutilized if prototype quality is weak. To address this, we introduce POTNet, an adaptation of Prototypical Optimal Transport that replaces POT's single k-means step with an entropy-guided dual-clustering head: high-entropy pixels are organized by spectral clustering, low-entropy pixels by k-means, and the two prototype sets are subsequently aligned by OT. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Image and Video Quality Assessment
