DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation

Ivan Martinovi\'c; Josip \v{S}ari\'c; Marin Or\v{s}i\'c; Matej Kristan; Sini\v{s}a \v{S}egvi\'c

arXiv:2507.10118·cs.CV·July 15, 2025

DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation

Ivan Martinovi\'c, Josip \v{S}ari\'c, Marin Or\v{s}i\'c, Matej Kristan, Sini\v{s}a \v{S}egvi\'c

PDF

Open Access 1 Repo

TL;DR

DEARLi introduces a novel semi-supervised panoptic segmentation method that decouples recognition and localization, leveraging foundation models to significantly improve performance with limited labeled data and reduced computational resources.

Contribution

The paper proposes DEARLi, a new approach that enhances recognition and localization separately using foundation models, achieving state-of-the-art results in semi-supervised segmentation with less memory.

Findings

01

Outperforms state-of-the-art in semi-supervised semantic segmentation.

02

Achieves 29.9 PQ and 38.9 mIoU on ADE20K with only 158 labeled images.

03

Requires 8x less GPU memory than previous methods.

Abstract

Pixel-level annotation is expensive and time-consuming. Semi-supervised segmentation methods address this challenge by learning models on few labeled images alongside a large corpus of unlabeled images. Although foundation models could further account for label scarcity, effective mechanisms for their exploitation remain underexplored. We address this by devising a novel semi-supervised panoptic approach fueled by two dedicated foundation models. We enhance recognition by complementing unsupervised mask-transformer consistency with zero-shot classification of CLIP features. We enhance localization by class-agnostic decoder warm-up with respect to SAM pseudo-labels. The resulting decoupled enhancement of recognition and localization (DEARLi) particularly excels in the most challenging semi-supervised scenarios with large taxonomies and limited labeled data. Moreover, DEARLi outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

helen1c/dearli
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Face and Expression Recognition

MethodsSegment Anything Model · Contrastive Language-Image Pre-training