CelebV-Text: A Large-Scale Facial Text-Video Dataset
Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne, Wu

TL;DR
CelebV-Text is a large, high-quality facial text-video dataset designed to advance research in text-driven face video generation, featuring 70,000 diverse videos with precise descriptive texts and a benchmark for evaluation.
Contribution
The paper introduces CelebV-Text, a novel large-scale facial text-video dataset with high-quality annotations and a benchmark, addressing the lack of suitable datasets for facial text-to-video generation.
Findings
CelebV-Text outperforms existing datasets in diversity and relevance.
The dataset enables effective training and evaluation of facial text-to-video models.
Benchmark results demonstrate the dataset's utility for standardizing evaluation.
Abstract
Text-driven generation models are flourishing in video generation and editing. However, face-centric text-to-video generation remains a challenge due to the lack of a suitable dataset containing high-quality videos and highly relevant texts. This paper presents CelebV-Text, a large-scale, diverse, and high-quality dataset of facial text-video pairs, to facilitate research on facial text-to-video generation tasks. CelebV-Text comprises 70,000 in-the-wild face video clips with diverse visual content, each paired with 20 texts generated using the proposed semi-automatic text generation strategy. The provided texts are of high quality, describing both static and dynamic attributes precisely. The superiority of CelebV-Text over other datasets is demonstrated via comprehensive statistical analysis of the videos, texts, and text-video relevance. The effectiveness and potential of CelebV-Text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
