CCMB: A Large-scale Chinese Cross-modal Benchmark
Chunyu Xie, Heng Cai, Jincheng Li, Fanjing Kong, Xiaoyu Wu, Jianfei, Song, Henrique Morimitsu, Lin Yao, Dexin Wang, Xiangzheng Zhang, Dawei Leng,, Baochang Zhang, Xiangyang Ji, Yafeng Deng

TL;DR
This paper introduces CCMB, the largest Chinese cross-modal benchmark dataset, and R2D2, a novel vision-language pre-training framework, achieving state-of-the-art results across multiple Chinese vision-language tasks.
Contribution
The work provides the first large-scale Chinese cross-modal dataset CCMB and a new VLP framework R2D2, advancing Chinese vision-language research and performance.
Findings
Achieved state-of-the-art results on 12 Chinese vision-language tasks.
Created the largest Chinese cross-modal dataset with 250M images and 750M texts.
Developed a novel pre-training framework with ranking and distillation strategies.
Abstract
Vision-language pre-training (VLP) on large-scale datasets has shown premier performance on various downstream tasks. In contrast to plenty of available benchmarks with English corpus, large-scale pre-training datasets and downstream datasets with Chinese corpus remain largely unexplored. In this work, we build a large-scale high-quality Chinese Cross-Modal Benchmark named CCMB for the research community, which contains the currently largest public pre-training dataset Zero and five human-annotated fine-tuning datasets for downstream tasks. Zero contains 250 million images paired with 750 million text descriptions, plus two of the five fine-tuning datasets are also currently the largest ones for Chinese cross-modal downstream tasks. Along with the CCMB, we also develop a VLP framework named R2D2, applying a pre-Ranking + Ranking strategy to learn powerful vision-language representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsRecurrent Replay Distributed DQN
