OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Yexin Liu; Manyuan Zhang; Yueze Wang; Hongyu Li; Dian Zheng; Weiming Zhang; Changsheng Lu; Xunliang Cai; Yan Feng; Peng Pei; Harry Yang

arXiv:2512.08294·cs.CV·December 11, 2025

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Yexin Liu, Manyuan Zhang, Yueze Wang, Hongyu Li, Dian Zheng, Weiming Zhang, Changsheng Lu, Xunliang Cai, Yan Feng, Peng Pei, Harry Yang

PDF

Open Access 1 Datasets

TL;DR

OpenSubject introduces a large-scale video-derived dataset with 2.5 million samples to enhance subject-driven image generation and manipulation, addressing identity fidelity and scene complexity issues.

Contribution

The paper presents a novel dataset and pipeline leveraging cross-frame identity priors for improved subject-driven image tasks.

Findings

01

Training with OpenSubject improves identity fidelity in complex scenes.

02

The dataset enhances subject-driven generation and manipulation performance.

03

The benchmark evaluates multiple aspects including identity and background consistency.

Abstract

Despite the promising progress in subject-driven image generation, current models often deviate from the reference identities and struggle in complex scenes with multiple subjects. To address this challenge, we introduce OpenSubject, a video-derived large-scale corpus with 2.5M samples and 4.35M images for subject-driven generation and manipulation. The dataset is built with a four-stage pipeline that exploits cross-frame identity priors. (i) Video Curation. We apply resolution and aesthetic filtering to obtain high-quality clips. (ii) Cross-Frame Subject Mining and Pairing. We utilize vision-language model (VLM)-based category consensus, local grounding, and diversity-aware pairing to select image pairs. (iii) Identity-Preserving Reference Image Synthesis. We introduce segmentation map-guided outpainting to synthesize the input images for subject-driven generation and box-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AIPeanutman/OpenSubject
dataset· 2.1k dl
2.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis