OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation
Seungjae Moon, Seunghyun Oh, Youngmin Ro

TL;DR
OV-Stitcher is a training-free framework that enhances open-vocabulary semantic segmentation by enabling global attention through feature stitching, leading to more coherent and accurate segmentation maps.
Contribution
It introduces a novel feature stitching method within the final encoder block to enable global attention without additional training.
Findings
Improves mIoU from 48.7 to 50.7 on benchmarks.
Enables global attention in training-free segmentation.
Outperforms prior training-free methods.
Abstract
Training-free open-vocabulary semantic segmentation(TF-OVSS) has recently attracted attention for its ability to perform dense prediction by leveraging the pretrained knowledge of large vision and vision-language models, without requiring additional training. However, due to the limited input resolution of these pretrained encoders, existing TF-OVSS methods commonly adopt a sliding-window strategy that processes cropped sub-images independently. While effective for managing high-resolution inputs, this approach prevents global attention over the full image, leading to fragmented feature representations and limited contextual reasoning. We propose OV-Stitcher, a training-free framework that addresses this limitation by stitching fragmented sub-image features directly within the final encoder block. By reconstructing attention representations from fragmented sub-image features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
