Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

An Yang; Junshu Pan; Junyang Lin; Rui Men; Yichang Zhang; Jingren; Zhou; Chang Zhou

arXiv:2211.01335·cs.CV·May 24, 2023·53 cites

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren, Zhou, Chang Zhou

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces Chinese CLIP, a set of large-scale Chinese vision-language models trained on a new dataset, employing a two-stage pretraining method, achieving state-of-the-art results on multiple benchmarks.

Contribution

It constructs a large-scale Chinese image-text dataset and develops multiple Chinese CLIP models with a novel two-stage pretraining approach.

Findings

01

Chinese CLIP achieves state-of-the-art results on MUGE, Flickr30K-CN, and COCO-CN.

02

Models perform competitively in zero-shot image classification on ELEVATER.

03

Two-stage pretraining enhances model performance.

Abstract

The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining. In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset. We develop 5 Chinese CLIP models of multiple sizes, spanning from 77 to 958 million parameters. Furthermore, we propose a two-stage pretraining method, where the model is first trained with the image encoder frozen and then trained with all parameters being optimized, to achieve enhanced model performance. Our comprehensive experiments demonstrate that Chinese CLIP can achieve the state-of-the-art performance on MUGE, Flickr30K-CN, and COCO-CN in the setups of zero-shot learning and finetuning, and it is able to achieve competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ofa-sys/chinese-clip
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training · Contrastive Learning