Deep Video-Based Performance Cloning
Kfir Aberman, Mingyi Shi, Jing Liao, Dani Lischinski, Baoquan Chen,, Daniel Cohen-Or

TL;DR
This paper introduces a deep learning method for performance cloning from videos, enabling the generation of realistic, temporally coherent videos of a target actor reenacting different performances without requiring motion capture data.
Contribution
It proposes a novel dual-branch neural network architecture that learns to generate appearance and motion from unpaired and paired video data, improving performance cloning capabilities.
Findings
Successfully generates temporally coherent videos of different dance performances.
Handles challenging scenarios with different performances and poses.
Operates without motion capture or depth information.
Abstract
We present a new video-based performance cloning technique. After training a deep generative network using a reference video capturing the appearance and dynamics of a target actor, we are able to generate videos where this actor reenacts other performances. All of the training data and the driving performances are provided as ordinary video segments, without motion capture or depth information. Our generative model is realized as a deep neural network with two branches, both of which train the same space-time conditional generator, using shared weights. One branch, responsible for learning to generate the appearance of the target actor in various poses, uses \emph{paired} training data, self-generated from the reference video. The second branch uses unpaired data to improve generation of temporally coherent video renditions of unseen pose sequences. We demonstrate a variety of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
