TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang, Yi Yang

TL;DR
TIP-I2V is a large-scale dataset of over 1.7 million user prompts for image-to-video generation, enabling research on prompt analysis, model evaluation, and safety improvements in this emerging field.
Contribution
The paper introduces TIP-I2V, the first extensive dataset of text and image prompts specifically for image-to-video generation, filling a critical gap in available resources.
Findings
Compared TIP-I2V with VidProM and DiffusionDB highlighting key differences.
Demonstrated potential uses of TIP-I2V for model analysis and safety enhancement.
Provided a large, curated dataset to advance image-to-video research.
Abstract
Video generation models are revolutionizing content creation, with image-to-video models drawing increasing attention due to their enhanced controllability, visual consistency, and practical applications. However, despite their popularity, these models rely on user-provided text and image prompts, and there is currently no dedicated dataset for studying these prompts. In this paper, we introduce TIP-I2V, the first large-scale dataset of over 1.70 million unique user-provided Text and Image Prompts specifically for Image-to-Video generation. Additionally, we provide the corresponding generated videos from five state-of-the-art image-to-video models. We begin by outlining the time-consuming and costly process of curating this large-scale dataset. Next, we compare TIP-I2V to two popular prompt datasets, VidProM (text-to-video) and DiffusionDB (text-to-image), highlighting differences in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · Focus
