SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama
Jing Tang, Quanlu Jia, Yuqiang Xie, Zeyu Gong, Xiang Wen, Jiayi Zhang,, Yalong Guo, Guibin Chen, Jiangping Yang

TL;DR
SkyScript-100M is a vast dataset of one billion script and shooting script pairs for short dramas, created from extensive internet-sourced episodes, enabling advanced research in script optimization and text-to-video generation.
Contribution
The paper introduces SkyScript-100M, a large-scale dataset of scripts and shooting scripts for short dramas, generated through a novel model and extensive data collection, facilitating new research opportunities.
Findings
SkyScript-100M surpasses existing datasets in size and diversity.
The dataset enables deeper insights into script and shooting script relationships.
It can significantly advance text-to-video and short drama generation research.
Abstract
Generating high-quality shooting scripts containing information such as scene and shot language is essential for short drama script generation. We collect 6,660 popular short drama episodes from the Internet, each with an average of 100 short episodes, and the total number of short episodes is about 80,000, with a total duration of about 2,000 hours and totaling 10 terabytes (TB). We perform keyframe extraction and annotation on each episode to obtain about 10,000,000 shooting scripts. We perform 100 script restorations on the extracted shooting scripts based on our self-developed large short drama generation model SkyReels. This leads to a dataset containing 1,000,000,000 pairs of scripts and shooting scripts for short dramas, called SkyScript-100M. We compare SkyScript-100M with the existing dataset in detail and demonstrate some deeper insights that can be achieved based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Modeling in Geospatial Applications
