Exploring Vision-Language Models for Imbalanced Learning

Yidong Wang; Zhuohao Yu; Jindong Wang; Qiang Heng; Hao Chen; Wei Ye,; Rui Xie; Xing Xie; Shikun Zhang

arXiv:2304.01457·cs.AI·June 22, 2023·1 cites

Exploring Vision-Language Models for Imbalanced Learning

Yidong Wang, Zhuohao Yu, Jindong Wang, Qiang Heng, Hao Chen, Wei Ye,, Rui Xie, Xing Xie, Shikun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper enhances vision-language models for imbalanced datasets by adding a lightweight decoder and applying imbalanced learning techniques, significantly improving classification accuracy on challenging datasets.

Contribution

Introduces a lightweight decoder and combines imbalanced learning methods to improve VLM performance on skewed datasets, addressing memory issues and tail class recognition.

Findings

01

Significant accuracy improvements on ImageNet-LT, iNaturalist18, and Places-LT.

02

Decoder and imbalanced methods outperform baseline VLMs.

03

Analysis of pre-training data size, backbone, and training cost effects.

Abstract

Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. However, their performance on imbalanced dataset is relatively poor, where the distribution of classes in the training dataset is skewed, leading to poor performance in predicting minority classes. For instance, CLIP achieved only 5% accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder to VLMs to avoid OOM (out of memory) problem caused by large number of classes and capture nuanced features for tail classes. Then, we explore improvements of VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments demonstrate that the performance of VLMs can be further boosted when used with decoder and imbalanced methods. Specifically, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imbalance-vlm/imbalance-vlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsSoftmax · Contrastive Language-Image Pre-training · Focal Loss