YOLOE-26: Integrating YOLO26 with YOLOE for Real-Time Open-Vocabulary Instance Segmentation

Ranjan Sapkota; Manoj Karkee

arXiv:2602.00168·cs.CV·February 3, 2026

YOLOE-26: Integrating YOLO26 with YOLOE for Real-Time Open-Vocabulary Instance Segmentation

Ranjan Sapkota, Manoj Karkee

PDF

Open Access

TL;DR

YOLOE-26 is a real-time, open-vocabulary instance segmentation framework that combines YOLO26's efficiency with advanced open-vocabulary reasoning, enabling flexible, prompt-based, and autonomous segmentation in real-world scenarios.

Contribution

It introduces a novel architecture integrating open-vocabulary learning with YOLO26, including a unified embedding space and multiple prompt modalities for real-time segmentation.

Findings

01

Consistent scaling and accuracy-efficiency trade-offs demonstrated across models.

02

Effective zero-shot and prompt-based segmentation in real-time.

03

Compatible with large-scale detection and grounding datasets.

Abstract

This paper presents YOLOE-26, a unified framework that integrates the deployment-optimized YOLO26(or YOLOv26) architecture with the open-vocabulary learning paradigm of YOLOE for real-time open-vocabulary instance segmentation. Building on the NMS-free, end-to-end design of YOLOv26, the proposed approach preserves the hallmark efficiency and determinism of the YOLO family while extending its capabilities beyond closed-set recognition. YOLOE-26 employs a convolutional backbone with PAN/FPN-style multi-scale feature aggregation, followed by end-to-end regression and instance segmentation heads. A key architectural contribution is the replacement of fixed class logits with an object embedding head, which formulates classification as similarity matching against prompt embeddings derived from text descriptions, visual examples, or a built-in vocabulary. To enable efficient open-vocabulary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning