Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Zhuowei Chen; Bingchuan Li; Tianxiang Ma; Lijie Liu; Mingcong Liu; Yi Zhang; Gen Li; Xinghui Li; Siyu Zhou; Qian He; Xinglong Wu

arXiv:2506.18851·cs.CV·June 24, 2025

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Zhuowei Chen, Bingchuan Li, Tianxiang Ma, Lijie Liu, Mingcong Liu, Yi Zhang, Gen Li, Xinghui Li, Siyu Zhou, Qian He, Xinglong Wu

PDF

TL;DR

Phantom-Data is a large, diverse dataset designed to improve subject consistency in video generation models by addressing the copy-paste problem through cross-pair training.

Contribution

The paper introduces Phantom-Data, the first large-scale dataset for cross-pair subject-to-video consistency, enhancing model fidelity and identity preservation.

Findings

01

Training with Phantom-Data improves prompt alignment.

02

Enhances visual quality in generated videos.

03

Maintains identity consistency comparable to in-pair methods.

Abstract

Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by sampling reference images from the same scene as the target video. To address this issue, we introduce \textbf{Phantom-Data, the first general-purpose cross-pair subject-to-video consistency dataset}, containing approximately one million identity-consistent pairs across diverse categories. Our dataset is constructed via a three-stage pipeline: (1) a general and input-aligned subject detection module, (2) large-scale cross-context subject retrieval from more than 53 million videos and 3 billion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.