GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xinchi Deng, Han Shi, Runhui Huang, Changlin Li, Hang Xu, Jianhua Han,, James Kwok, Shen Zhao, Wei Zhang, Xiaodan Liang

TL;DR
GrowCLIP introduces a data-driven, automatic model growing approach for contrastive language-image pre-training that adapts architecture dynamically to continuously growing data, enhancing performance on downstream tasks.
Contribution
It proposes a novel dynamic growth space and parameter inheriting method, enabling models to adapt architecture during online learning for improved efficiency and accuracy.
Findings
Achieves 2.3% higher top-1 accuracy on zero-shot classification
Improves 1.2% top-1 image-to-text recall on Flickr30K
Demonstrates effective adaptation to continuously growing data
Abstract
Cross-modal pre-training has shown impressive performance on a wide range of downstream tasks, benefiting from massive image-text pairs collected from the Internet. In practice, online data are growing constantly, highlighting the importance of the ability of pre-trained model to learn from data that is continuously growing. Existing works on cross-modal pre-training mainly focus on training a network with fixed architecture. However, it is impractical to limit the model capacity when considering the continuously growing nature of pre-training data in real-world applications. On the other hand, it is important to utilize the knowledge in the current model to obtain efficient training and better performance. To address the above issues, in this paper, we propose GrowCLIP, a data-driven automatic model growing algorithm for contrastive language-image pre-training with continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsFocus
