Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation

Gabriele Rosi; Fabio Cermelli

arXiv:2505.06280·cs.CV·August 7, 2025

Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation

Gabriele Rosi, Fabio Cermelli

PDF

Open Access 1 Repo

TL;DR

This paper introduces Show or Tell (SoT), a benchmark for comparing visual and textual prompts in semantic segmentation across diverse datasets, revealing their respective strengths and weaknesses.

Contribution

The paper presents the first comprehensive benchmark evaluating both visual and textual prompts in semantic segmentation under identical conditions.

Findings

01

Open-vocabulary methods perform well on common, easily described concepts.

02

Visual reference prompts show high variability depending on input quality.

03

Complex domains like tools challenge open-vocabulary approaches.

Abstract

Prompt engineering has shown remarkable success with large language models, yet its systematic exploration in computer vision remains limited. In semantic segmentation, both textual and visual prompts offer distinct advantages: textual prompts through open-vocabulary methods allow segmentation of arbitrary categories, while visual reference prompts provide intuitive reference examples. However, existing benchmarks evaluate these modalities in isolation, without direct comparison under identical conditions. We present Show or Tell (SoT), a novel benchmark specifically designed to evaluate both visual and textual prompts for semantic segmentation across 14 datasets spanning 7 diverse domains (common scenes, urban, food, waste, parts, tools, and land-cover). We evaluate 5 open-vocabulary methods and 4 visual reference prompt approaches, adapting the latter to handle multi-class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FocoosAI/ShowOrTell
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning