Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis
Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin

TL;DR
This paper introduces CPF-Net, a scalable model that transfers human motion from a source to any target person using only one image, by integrating human parsing and appearance flow for realistic video synthesis.
Contribution
The work presents a novel, unified framework that decouples human parsing, appearance flow, and video generation, enabling scalable and realistic motion transfer for any target person.
Findings
Outperforms previous methods in realism and consistency
Generates photo-realistic videos with diverse poses
Achieves high temporal coherence in synthesized videos
Abstract
Transferring human motion from a source to a target person poses great potential in computer vision and graphics applications. A crucial step is to manipulate sequential future motion while retaining the appearance characteristic.Previous work has either relied on crafted 3D human models or trained a separate model specifically for each target person, which is not scalable in practice.This work studies a more general setting, in which we aim to learn a single model to parsimoniously transfer motion from a source video to any target person given only one image of the person, named as Collaborative Parsing-Flow Network (CPF-Net). The paucity of information regarding the target person makes the task particularly challenging to faithfully preserve the appearance in varying designated poses. To address this issue, CPF-Net integrates the structured human parsing and appearance flow to guide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
