UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Zhucun Xue, Jiangning Zhang, Teng Hu, Haoyang He, Yinan Chen, Yuxuan Cai, Yabiao Wang, Chengjie Wang, Yong Liu, Xiangtai Li, Dacheng Tao

TL;DR
This paper introduces UltraVideo, a high-quality UHD-4K video dataset with detailed captions, and UltraWan models for generating high-resolution videos, aiming to advance UHD video generation research.
Contribution
The paper presents a new large-scale UHD-4K video dataset with comprehensive captions and a novel automated curation process, along with UltraWan models for high-quality video generation.
Findings
UltraVideo contains diverse topics and detailed structured captions.
UltraWan models can generate high-quality 1K/4K videos with improved text controllability.
The dataset and models support future UHD video generation research.
Abstract
The quality of the video dataset (image quality, resolution, and fine-grained caption) greatly influences the performance of the video generation model. The growing demand for video applications sets higher requirements for high-quality video generation models. For example, the generation of movie-level Ultra-High Definition (UHD) videos and the creation of 4K short video content. However, the existing public datasets cannot support related research and applications. In this paper, we first propose a high-quality open-sourced UHD-4K (22.4\% of which are 8K) text-to-video dataset named UltraVideo, which contains a wide range of topics (more than 100 kinds), and each video has 9 structured captions with one summarized caption (average of 824 words). Specifically, we carefully design a highly automated curation process with four stages to obtain the final high-quality dataset: \textit{i)}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Video Coding and Compression Technologies · Image and Video Quality Assessment
