GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive   Language-Image Pre-training

Xinchi Deng; Han Shi; Runhui Huang; Changlin Li; Hang Xu; Jianhua Han,; James Kwok; Shen Zhao; Wei Zhang; Xiaodan Liang

arXiv:2308.11331·cs.CV·August 23, 2023

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training

Xinchi Deng, Han Shi, Runhui Huang, Changlin Li, Hang Xu, Jianhua Han,, James Kwok, Shen Zhao, Wei Zhang, Xiaodan Liang

PDF

Open Access

TL;DR

GrowCLIP introduces a data-driven, automatic model growing approach for contrastive language-image pre-training that adapts architecture dynamically to continuously growing data, enhancing performance on downstream tasks.

Contribution

It proposes a novel dynamic growth space and parameter inheriting method, enabling models to adapt architecture during online learning for improved efficiency and accuracy.

Findings

01

Achieves 2.3% higher top-1 accuracy on zero-shot classification

02

Improves 1.2% top-1 image-to-text recall on Flickr30K

03

Demonstrates effective adaptation to continuously growing data

Abstract

Cross-modal pre-training has shown impressive performance on a wide range of downstream tasks, benefiting from massive image-text pairs collected from the Internet. In practice, online data are growing constantly, highlighting the importance of the ability of pre-trained model to learn from data that is continuously growing. Existing works on cross-modal pre-training mainly focus on training a network with fixed architecture. However, it is impractical to limit the model capacity when considering the continuously growing nature of pre-training data in real-world applications. On the other hand, it is important to utilize the knowledge in the current model to obtain efficient training and better performance. To address the above issues, in this paper, we propose GrowCLIP, a data-driven automatic model growing algorithm for contrastive language-image pre-training with continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsFocus