Unsupervised Pre-training with Language-Vision Prompts for Low-Data   Instance Segmentation

Dingwen Zhang; Hao Li; Diqi He; Nian Liu; Lechao Cheng; Jingdong Wang,; Junwei Han

arXiv:2405.13388·cs.CV·May 24, 2024

Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation

Dingwen Zhang, Hao Li, Diqi He, Nian Liu, Lechao Cheng, Jingdong Wang,, Junwei Han

PDF

Open Access 1 Repo

TL;DR

This paper introduces UPLVP, an unsupervised pre-training method using language-vision prompts to enhance query-based end-to-end instance segmentation models in low-data scenarios, achieving faster convergence and better performance.

Contribution

The paper proposes a novel unsupervised pre-training approach with language-vision prompts specifically designed for low-data instance segmentation, addressing limitations of existing QEIS methods.

Findings

01

Improved QEIS performance on MS COCO, Cityscapes, and CTW1500 datasets.

02

Faster convergence of QEIS models with pre-training.

03

Significant performance gains in low-data regimes.

Abstract

In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from their reliance on substantial data volumes to effectively train the pivotal queries/kernels that are essential for acquiring localization and shape priors. To address this problem, we propose a novel method for unsupervised pre-training in low-data regimes. Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts (UPLVP), which improves QEIS models' instance segmentation by bringing language-vision prompts to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lifuguan/uplvp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Handwritten Text Recognition Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections