Text-Guided Mixup Towards Long-Tailed Image Categorization

Richard Franklin; Jiawei Yao; Deyang Zhong; Qi Qian; Juhua Hu

arXiv:2409.03583·cs.CV·September 6, 2024

Text-Guided Mixup Towards Long-Tailed Image Categorization

Richard Franklin, Jiawei Yao, Deyang Zhong, Qi Qian, Juhua Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel text-guided mixup method leveraging vision-language models like CLIP to improve long-tailed image classification by utilizing semantic relations from textual information, showing promising empirical results.

Contribution

It proposes a new text-guided mixup technique that uses pre-trained vision-language models to address long-tailed class distributions in image categorization.

Findings

01

Effective in long-tailed benchmarks

02

Leverages semantic relations from text

03

Theoretical guarantees provided

Abstract

In many real-world applications, the frequency distribution of class labels for training data can exhibit a long-tailed distribution, which challenges traditional approaches of training deep neural networks that require heavy amounts of balanced data. Gathering and labeling data to balance out the class label distribution can be both costly and time-consuming. Many existing solutions that enable ensemble learning, re-balancing strategies, or fine-tuning applied to deep neural networks are limited by the inert problem of few class samples across a subset of classes. Recently, vision-language models like CLIP have been observed as effective solutions to zero-shot or few-shot learning by grasping a similarity between vision and language features for image and text pairs. Considering that large pre-trained vision-language models may contain valuable side textual information for minor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rsamf/text-guided-mixup
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsMixup · Contrastive Language-Image Pre-training