LarvSeg: Exploring Image Classification Data For Large Vocabulary   Semantic Segmentation via Category-wise Attentive Classifier

Haojun Yu; Di Dai; Ziwei Zhao; Di He; Han Hu; Liwei Wang

arXiv:2501.06862·cs.CV·January 14, 2025

LarvSeg: Exploring Image Classification Data For Large Vocabulary Semantic Segmentation via Category-wise Attentive Classifier

Haojun Yu, Di Dai, Ziwei Zhao, Di He, Han Hu, Liwei Wang

PDF

Open Access 2 Repos

TL;DR

LarvSeg introduces a novel framework that leverages image classification data to enhance large vocabulary semantic segmentation, effectively addressing label annotation challenges and improving performance on categories without mask labels.

Contribution

LarvSeg is the first to utilize large-scale classification datasets like ImageNet21K for semantic segmentation, introducing a category-wise attentive classifier and a training framework for out-of-distribution categories.

Findings

01

Significant performance improvements on large vocabulary segmentation tasks.

02

Effective segmentation on categories without mask labels.

03

First large-scale 21K-category segmentation model using ImageNet21K.

Abstract

Scaling up the vocabulary of semantic segmentation models is extremely challenging because annotating large-scale mask labels is labour-intensive and time-consuming. Recently, language-guided segmentation models have been proposed to address this challenge. However, their performance drops significantly when applied to out-of-distribution categories. In this paper, we propose a new large vocabulary semantic segmentation framework, called LarvSeg. Different from previous works, LarvSeg leverages image classification data to scale the vocabulary of semantic segmentation models as large-vocabulary classification datasets usually contain balanced categories and are much easier to obtain. However, for classification tasks, the category is image-level, while for segmentation we need to predict the label at pixel level. To address this issue, we first propose a general baseline framework to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI