TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Wenhao Wang; Yi Yang

arXiv:2411.04709·cs.CV·July 10, 2025

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Wenhao Wang, Yi Yang

PDF

Open Access 1 Datasets

TL;DR

TIP-I2V is a large-scale dataset of over 1.7 million user prompts for image-to-video generation, enabling research on prompt analysis, model evaluation, and safety improvements in this emerging field.

Contribution

The paper introduces TIP-I2V, the first extensive dataset of text and image prompts specifically for image-to-video generation, filling a critical gap in available resources.

Findings

01

Compared TIP-I2V with VidProM and DiffusionDB highlighting key differences.

02

Demonstrated potential uses of TIP-I2V for model analysis and safety enhancement.

03

Provided a large, curated dataset to advance image-to-video research.

Abstract

Video generation models are revolutionizing content creation, with image-to-video models drawing increasing attention due to their enhanced controllability, visual consistency, and practical applications. However, despite their popularity, these models rely on user-provided text and image prompts, and there is currently no dedicated dataset for studying these prompts. In this paper, we introduce TIP-I2V, the first large-scale dataset of over 1.70 million unique user-provided Text and Image Prompts specifically for Image-to-Video generation. Additionally, we provide the corresponding generated videos from five state-of-the-art image-to-video models. We begin by outlining the time-consuming and costly process of curating this large-scale dataset. Next, we compare TIP-I2V to two popular prompt datasets, VidProM (text-to-video) and DiffusionDB (text-to-image), highlighting differences in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

WenhaoWang/TIP-I2V
dataset· 3.2k dl
3.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need · Focus