CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Ziqi Zhang, Yuxin Chen, Zongyang Ma, Zhongang Qi, Chunfeng Yuan, Bing, Li, Ying Shan, Weiming Hu

TL;DR
CREATE is a comprehensive Chinese short video benchmark with a large dataset, designed to advance research in video retrieval and title generation, integrating multi-modal alignment with a novel model.
Contribution
It introduces the first large-scale Chinese short video retrieval and title generation benchmark, including datasets and a new model for multi-modal alignment.
Findings
Created a 210K labeled dataset for Chinese short videos.
Developed a model combining retrieval and titling tasks.
Facilitated future research in Chinese video applications.
Abstract
Previous works of video captioning aim to objectively describe the video's actual content, which lacks subjective and attractive expression, limiting its practical application scenarios. Video titling is intended to achieve this goal, but there is a lack of a proper benchmark. In this paper, we propose to CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration benchmark, to facilitate research and application in video titling and video retrieval in Chinese. CREATE consists of a high-quality labeled 210K dataset and two large-scale 3M/10M pre-training datasets, covering 51 categories, 50K+ tags, 537K manually annotated titles and captions, and 10M+ short videos. Based on CREATE, we propose a novel model ALWIG which combines video retrieval and video titling tasks to achieve the purpose of multi-modal ALignment WIth Generation with the help of video tags and a GPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Linear Layer · Adam · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Layer Normalization · Discriminative Fine-Tuning · Dropout · Cosine Annealing
