CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection

Xinlin Zhuang; Yichen Li; Xiwei Liu; Haolin Yang; Yifan Lu; Ziyun Zou; Yulong Li; Huifa Li; Dongliang Chen; Qinglei Wang; Weiyang Liu; Ying Qian; Jiangming Shi; Imran Razzak

arXiv:2511.18519·cs.LG·March 17, 2026

CHIPS: Efficient CLIP Adaptation via Curvature-aware Hybrid Influence-based Data Selection

Xinlin Zhuang, Yichen Li, Xiwei Liu, Haolin Yang, Yifan Lu, Ziyun Zou, Yulong Li, Huifa Li, Dongliang Chen, Qinglei Wang, Weiyang Liu, Ying Qian, Jiangming Shi, Imran Razzak

PDF

Open Access 10 Models

TL;DR

This paper introduces CHIPS, a data selection method that effectively adapts CLIP to specific domains by selecting high-utility image-text pairs, reducing the need for large datasets and improving performance on medical and general benchmarks.

Contribution

CHIPS presents a novel, theoretically justified data selection approach that integrates curvature-aware alignment, scalable estimators, and relevance weighting for efficient CLIP adaptation.

Findings

01

CHIPS achieves state-of-the-art results among selection methods on medical benchmarks.

02

It matches full-dataset CPT performance with only 30% of the data.

03

It outperforms half-dataset CPT using just 10% of the data.

Abstract

Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets. Yet, data itself remains an underexplored factor in this process. We revisit this task from a data-centric perspective: Can effective data selection substitute for large-scale datasets in CPT? We introduce CHIPS (Curvature-aware Hybrid Influence in Projection Subspace), which assigns each image-text pair a utility score that integrates three complementary factors aligned with three goals: faithfulness via a curvature-aware and Newton-style alignment computed in CLIP's end-point subspace; scalability via an InfoNCE-aware curvature estimator with Johnson-Lindenstrauss (JL) sketching; and retention via a selection-aware relevance weight combined with learnability to balance target adaptation against general-domain preservation. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Face recognition and analysis