Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

Yongkang Li; Tianheng Cheng; Bin Feng; Wenyu Liu; Xinggang; Wang

arXiv:2412.04533·cs.CV·March 11, 2025

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

Yongkang Li, Tianheng Cheng, Bin Feng, Wenyu Liu, Xinggang, Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

Mask-Adapter improves open-vocabulary segmentation by extracting richer semantic features and enforcing mask consistency, leading to significant performance gains across multiple benchmarks and models.

Contribution

The paper introduces Mask-Adapter, a novel method that enhances mask classification accuracy by extracting semantic activation maps and applying a mask consistency loss.

Findings

01

Significant performance improvements on zero-shot segmentation benchmarks.

02

Effective extension of Mask-Adapter to SAM model.

03

Robustness to varying predicted masks demonstrated.

Abstract

Recent open-vocabulary segmentation methods adopt mask generators to predict segmentation masks and leverage pre-trained vision-language models, e.g., CLIP, to classify these masks via mask pooling. Although these approaches show promising results, it is counterintuitive that accurate masks often fail to yield accurate classification results through pooling CLIP image embeddings within the mask regions. In this paper, we reveal the performance limitations of mask pooling and introduce Mask-Adapter, a simple yet effective method to address these challenges in open-vocabulary segmentation. Compared to directly using proposal masks, our proposed Mask-Adapter extracts semantic activation maps from proposal masks, providing richer contextual information and ensuring alignment between masks and CLIP. Additionally, we propose a mask consistency loss that encourages proposal masks with similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hustvl/maskadapter
pytorchOfficial

Models

🤗
owl10/Mask-Adapter
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsADaptive gradient method with the OPTimal convergence rate · Segment Anything Model · Contrastive Language-Image Pre-training