MiraData: A Large-Scale Video Dataset with Long Durations and Structured   Captions

Xuan Ju; Yiming Gao; Zhaoyang Zhang; Ziyang Yuan; Xintao Wang; Ailing; Zeng; Yu Xiong; Qiang Xu; Ying Shan

arXiv:2407.06358·cs.CV·July 10, 2024·2 cites

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing, Zeng, Yu Xiong, Qiang Xu, Ying Shan

PDF

Open Access 2 Repos 1 Models 1 Datasets

TL;DR

MiraData is a large, high-quality video dataset with long durations and detailed structured captions, designed to improve video generation and evaluation, especially for high-motion, long-duration videos.

Contribution

The paper introduces MiraData, a novel dataset with longer videos and detailed captions, and MiraBench, an enhanced benchmark with new metrics for assessing motion and temporal consistency.

Findings

01

MiraData outperforms existing datasets in video duration and caption detail.

02

MiraBench provides comprehensive metrics including 3D consistency and motion strength.

03

Experiments show MiraDiT benefits from MiraData, especially in motion quality.

Abstract

Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. We curate MiraData from diverse, manually selected sources and meticulously process the data to obtain semantically consistent clips. GPT-4V is employed to annotate structured captions, providing detailed descriptions from four different perspectives along with a summarized dense caption. To better assess temporal consistency and motion intensity in video generation, we introduce MiraBench,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Video-Bench/Video-Bench
model· 1 dl· ♡ 1
1 dl♡ 1

Datasets

TencentARC/MiraData
dataset· 180 dl
180 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization