A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
Niclas V\"odisch, K\"ursat Petek, Markus K\"appeler, Abhinav Valada,, Wolfram Burgard

TL;DR
This paper introduces PASTEL, a label-efficient panoptic segmentation method that leverages foundation models, lightweight training, and self-training to achieve high accuracy with minimal annotations in robotic perception tasks.
Contribution
It proposes a novel fusion module and self-training scheme that significantly improve label-efficient segmentation performance using foundation model features.
Findings
Outperforms previous label-efficient segmentation methods
Effective in autonomous driving and agricultural robotics
Requires fewer annotations for high accuracy
Abstract
A key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data while achieving accurate predictions. This is essential not only to decrease operating costs but also to speed up deployment time. In this work, we address this challenge for PAnoptic SegmenTation with fEw Labels (PASTEL) by exploiting the groundwork paved by visual foundation models. We leverage descriptive image features from such a model to train two lightweight network heads for semantic segmentation and object boundary detection, using very few annotated training samples. We then merge their predictions via a novel fusion module that yields panoptic maps based on normalized cut. To further enhance the performance, we utilize self-training on unlabeled images selected by a feature-driven similarity scheme. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
