OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang; Yanze Wu; Mengtian Li; Xu Bai; Songtao Zhao; Fulong Ye; Chong Mou; Xinghui Li; Zhuowei Chen; Qian He; Mingyuan Gao

arXiv:2601.14250·cs.CV·January 21, 2026

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang, Yanze Wu, Mengtian Li, Xu Bai, Songtao Zhao, Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He, Mingyuan Gao

PDF

Open Access

TL;DR

OmniTransfer is a versatile framework that enhances spatio-temporal video transfer by leveraging multi-view information, enabling high-quality, flexible, and generalizable video generation without relying on task-specific priors.

Contribution

It introduces a unified approach with three key designs to improve appearance consistency, temporal control, and task adaptability in video transfer tasks.

Findings

01

Outperforms existing methods in appearance and temporal transfer tasks.

02

Matches pose-guided methods in motion transfer without pose information.

03

Establishes a new paradigm for flexible, high-fidelity video generation.

Abstract

Videos convey richer information than images or text, capturing both spatial and temporal dynamics. However, most existing video customization methods rely on reference images or task-specific temporal priors, failing to fully exploit the rich spatio-temporal information inherent in videos, thereby limiting flexibility and generalization in video generation. To address these limitations, we propose OmniTransfer, a unified framework for spatio-temporal video transfer. It leverages multi-view information across frames to enhance appearance consistency and exploits temporal cues to enable fine-grained temporal control. To unify various video transfer tasks, OmniTransfer incorporates three key designs: Task-aware Positional Bias that adaptively leverages reference video information to improve temporal alignment or appearance consistency; Reference-decoupled Causal Learning separating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Multimodal Machine Learning Applications