OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation

Seungjae Moon; Seunghyun Oh; Youngmin Ro

arXiv:2604.08110·cs.CV·April 13, 2026

OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation

Seungjae Moon, Seunghyun Oh, Youngmin Ro

PDF

TL;DR

OV-Stitcher is a training-free framework that enhances open-vocabulary semantic segmentation by enabling global attention through feature stitching, leading to more coherent and accurate segmentation maps.

Contribution

It introduces a novel feature stitching method within the final encoder block to enable global attention without additional training.

Findings

01

Improves mIoU from 48.7 to 50.7 on benchmarks.

02

Enables global attention in training-free segmentation.

03

Outperforms prior training-free methods.

Abstract

Training-free open-vocabulary semantic segmentation(TF-OVSS) has recently attracted attention for its ability to perform dense prediction by leveraging the pretrained knowledge of large vision and vision-language models, without requiring additional training. However, due to the limited input resolution of these pretrained encoders, existing TF-OVSS methods commonly adopt a sliding-window strategy that processes cropped sub-images independently. While effective for managing high-resolution inputs, this approach prevents global attention over the full image, leading to fragmented feature representations and limited contextual reasoning. We propose OV-Stitcher, a training-free framework that addresses this limitation by stitching fragmented sub-image features directly within the final encoder block. By reconstructing attention representations from fragmented sub-image features,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.