MV-S2V: Multi-View Subject-Consistent Video Generation

Ziyang Song; Xinyu Gong; Bangya Liu; Zelin Zhao

arXiv:2601.17756·cs.CV·May 5, 2026

MV-S2V: Multi-View Subject-Consistent Video Generation

Ziyang Song, Xinyu Gong, Bangya Liu, Zelin Zhao

PDF

1 Repo

TL;DR

This paper introduces MV-S2V, a novel multi-view subject-consistent video generation method that synthesizes videos from multiple references, utilizing synthetic data and a new conditioning technique to improve 3D consistency.

Contribution

The work presents a new multi-view S2V task, a synthetic data pipeline, and TS-RoPE for better subject-view distinction, advancing subject-driven video synthesis.

Findings

01

Achieves superior 3D subject consistency with multi-view references.

02

Develops a synthetic data curation pipeline for training.

03

Introduces TS-RoPE to distinguish subjects and views effectively.

Abstract

Existing Subject-to-Video Generation (S2V) methods have achieved high-fidelity and subject-consistent video generation, yet remain constrained to single-view subject references. This limitation renders the S2V task reducible to an S2I + I2V pipeline, failing to exploit the full potential of video subject control. In this work, we propose and address the challenging Multi-View S2V (MV-S2V) task, which synthesizes videos from multiple reference views to enforce 3D-level subject consistency. Regarding the scarcity of training data, we first develop a synthetic data curation pipeline to generate highly customized synthetic data, complemented by a small-scale real-world captured dataset to boost the training of MV-S2V. Another key issue lies in the potential confusion between cross-subject and cross-view references in conditional generation. To overcome this, we further introduce Temporally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://szy-young.github.io/mv-s2v
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.