InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Jiayi Lin; Jiabo Huang; Jian Hu; Shaogang Gong

arXiv:2410.11473·cs.CV·January 6, 2025

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

Jiayi Lin, Jiabo Huang, Jian Hu, Shaogang Gong

PDF

Open Access 1 Video

TL;DR

InvSeg introduces a test-time prompt inversion technique that enhances open-vocabulary semantic segmentation by aligning visual and textual features through structure-aware prompt enrichment, achieving state-of-the-art results.

Contribution

The paper proposes InvSeg, a novel method that inverts image-specific visual context into text prompts, improving semantic segmentation accuracy across diverse datasets.

Findings

01

Achieves state-of-the-art performance on PASCAL VOC, PASCAL Context, and COCO datasets.

02

Utilizes Contrastive Soft Clustering to improve mask distinction and internal consistency.

03

Effectively aligns visual and textual features for open-vocabulary segmentation.

Abstract

Visual-textual correlations in the attention maps derived from text-to-image diffusion models are proven beneficial to dense visual prediction tasks, e.g., semantic segmentation. However, a significant challenge arises due to the input distributional discrepancy between the context-rich sentences used for image generation and the isolated class names typically used in semantic segmentation. This discrepancy hinders diffusion models from capturing accurate visual-textual correlations. To solve this, we propose InvSeg, a test-time prompt inversion method that tackles open-vocabulary semantic segmentation by inverting image-specific visual context into text prompt embedding space, leveraging structure information derived from the diffusion model's reconstruction process to enrich text prompts so as to associate each class with a structure-consistent mask. Specifically, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Advanced Neural Network Applications · Topic Modeling

MethodsSoftmax · Attention Is All You Need · Diffusion · ALIGN