IDEA: Image Description Enhanced CLIP-Adapter

Zhipeng Ye; Feng Jiang; Qiufeng Wang; Kaizhu Huang; Jiaqi Huang

arXiv:2501.08816·cs.CV·January 22, 2025

IDEA: Image Description Enhanced CLIP-Adapter

Zhipeng Ye, Feng Jiang, Qiufeng Wang, Kaizhu Huang, Jiaqi Huang

PDF

Open Access 1 Repo

TL;DR

The paper introduces IDEA, a novel method that enhances CLIP's few-shot image classification by leveraging image textual descriptions, achieving state-of-the-art results without additional training.

Contribution

It proposes a training-free image description enhancement for CLIP and extends it with trainable components, significantly improving performance on multiple datasets.

Findings

01

IDEA achieves comparable or better results than state-of-the-art models.

02

T-IDEA further improves performance with lightweight learnable modules.

03

Generated 1.6 million image-text pairs for dataset enhancement.

Abstract

CLIP (Contrastive Language-Image Pre-training) has attained great success in pattern recognition and computer vision. Transferring CLIP to downstream tasks (e.g. zero- or few-shot classification) is a hot topic in multimodal learning. However, current studies primarily focus on either prompt learning for text or adapter tuning for vision, without fully exploiting the complementary information and correlations among image-text pairs. In this paper, we propose an Image Description Enhanced CLIP-Adapter (IDEA) method to adapt CLIP to few-shot image classification tasks. This method captures fine-grained features by leveraging both visual features and textual descriptions of images. IDEA is a training-free method for CLIP, and it can be comparable to or even exceeds state-of-the-art models on multiple tasks. Furthermore, we introduce Trainable-IDEA (T-IDEA), which extends IDEA by adding two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fourierai/idea
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Medical Image Segmentation Techniques · Radiomics and Machine Learning in Medical Imaging

MethodsFocus · Contrastive Language-Image Pre-training · LLaMA · Adapter