Koala-36M: A Large-scale Video Dataset Improving Consistency between   Fine-grained Conditions and Video Content

Qiuheng Wang; Yukai Shi; Jiarong Ou; Rui Chen; Ke Lin; Jiahao Wang,; Boyuan Jiang; Haotian Yang; Mingwu Zheng; Xin Tao; Fei Yang; Pengfei Wan; Di; Zhang

arXiv:2410.08260·cs.CV·April 29, 2025

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang,, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di, Zhang

PDF

Open Access

TL;DR

Koala-36M is a large-scale, high-quality video dataset designed to improve the alignment between detailed conditions and video content, enhancing the performance of video generation models.

Contribution

The paper introduces Koala-36M, a novel dataset with accurate temporal splitting, detailed captions, and quality filtering, addressing limitations of existing datasets.

Findings

01

Enhanced temporal consistency through a linear classifier for transition detection.

02

Structured captions averaging 200 words improve text-video alignment.

03

Filtering high-quality videos with VTSS boosts dataset reliability.

Abstract

With the continuous progress of visual generation technologies, the scale of video datasets has grown exponentially. The quality of these datasets plays a pivotal role in the performance of video generation models. We assert that temporal splitting, detailed captions, and video quality filtering are three crucial determinants of dataset quality. However, existing datasets exhibit various limitations in these areas. To address these challenges, we introduce Koala-36M, a large-scale, high-quality video dataset featuring accurate temporal splitting, detailed captions, and superior video quality. The essence of our approach lies in improving the consistency between fine-grained conditions and video content. Specifically, we employ a linear classifier on probability distributions to enhance the accuracy of transition detection, ensuring better temporal consistency. We then provide structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · AI in cancer detection