Direct Segmentation without Logits Optimization for Training-Free Open-Vocabulary Semantic Segmentation
Jiahao Li, Yang Lu, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu

TL;DR
This paper introduces a training-free method for open-vocabulary semantic segmentation that directly derives semantic maps from distribution discrepancy without iterative training or attention modulation.
Contribution
It proposes an analytic solution to the distribution discrepancy for segmentation, eliminating the need for logits optimization and training.
Findings
Achieves state-of-the-art performance on eight benchmark datasets.
Eliminates the need for iterative training and model-specific attention modulation.
Provides a direct, analytic approach to semantic segmentation.
Abstract
Open-vocabulary semantic segmentation (OVSS) aims to segment arbitrary category regions in images using open-vocabulary prompts, necessitating that existing methods possess pixel-level vision-language alignment capability. Typically, this capability involves computing the cosine similarity, \ie, logits, between visual and linguistic features, and minimizing the distribution discrepancy between the logits and the ground truth (GT) to generate optimal logits that are subsequently used to construct segmentation maps, yet it depends on time-consuming iterative training or model-specific attention modulation. In this work, we propose a more direct approach that eschews the logits-optimization process by directly deriving an analytic solution for the segmentation map. We posit a key hypothesis: the distribution discrepancy encodes semantic information; specifically, this discrepancy exhibits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
