Getting More Juice Out of Your Data: Hard Pair Refinement Enhances   Visual-Language Models Without Extra Data

Haonan Wang; Minbin Huang; Runhui Huang; Lanqing Hong; Hang Xu,; Tianyang Hu; Xiaodan Liang; Zhenguo Li; Hong Cheng; Kenji Kawaguchi

arXiv:2305.05208·cs.CV·February 10, 2025·5 cites

Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data

Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu,, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents HELIP, a cost-effective method to enhance CLIP models by leveraging challenging pairs in existing datasets during continuous training, significantly improving performance without extra data or extensive retraining.

Contribution

HELIP introduces a simple, seamless approach to improve visual-language models by exploiting difficult pairs in existing datasets during ongoing training, avoiding additional data and retraining.

Findings

01

Boosts zero-shot ImageNet accuracy by up to 10.1% in two epochs.

02

Improves fine-grained classification performance by up to 18.6%.

03

Enhances linear probe performance with significant gains.

Abstract

Contrastive Language-Image Pre-training (CLIP) has become the standard for cross-modal image-text representation learning. Improving CLIP typically requires additional data and retraining with new loss functions, but these demands raise resource and time costs, limiting practical use. In this work, we introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training. This eliminates the need for additional data or extensive retraining. Moreover, HELIP integrates effortlessly into current training pipelines with minimal code modifications, allowing for quick and seamless implementation. On comprehensive benchmarks, HELIP consistently boosts existing models. In particular, within just two epochs of training, it improves zero-shot classification accuracy on ImageNet for SLIP models pre-trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haonan3/helip
pytorchOfficial

Videos

Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsContrastive Language-Image Pre-training