dinov3.seg: Open-Vocabulary Semantic Segmentation with DINOv3

Saikat Dutta; Biplab Banerjee; Hamid Rezatofighi

arXiv:2603.19531·cs.CV·March 23, 2026

dinov3.seg: Open-Vocabulary Semantic Segmentation with DINOv3

Saikat Dutta, Biplab Banerjee, Hamid Rezatofighi

PDF

Open Access

TL;DR

dinov3.seg advances open-vocabulary semantic segmentation by integrating a tailored architecture, dual-level text embedding, early visual refinement, and high-resolution inference, leading to superior accuracy and robustness in complex scenes.

Contribution

It introduces a novel framework extending dinov3.txt with task-specific design, dual-level text embedding, and a high-resolution inference strategy for improved OVSS performance.

Findings

01

Consistently outperforms state-of-the-art methods on five benchmarks.

02

Enhances spatial precision and robustness in cluttered scenes.

03

Effectively combines semantic and spatial information for dense prediction.

Abstract

Open-Vocabulary Semantic Segmentation (OVSS) assigns pixel-level labels from an open set of text-defined categories, demanding reliable generalization to unseen classes at inference. Although modern vision-language models (VLMs) support strong open-vocabulary recognition, their representations learned through global contrastive objectives remain suboptimal for dense prediction, prompting many OVSS methods to depend on limited adaptation or refinement of image-text similarity maps. This, in turn, restricts spatial precision and robustness in complex, cluttered scenes. We introduce dinov3.seg, extending dinov3.txt into a dedicated framework for OVSS. Our contributions are four-fold. First, we design a task-specific architecture tailored to this backbone, systematically adapting established design principles from prior open-vocabulary segmentation work. Second, we jointly leverage text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications