Multi-Frame Content Integration with a Spatio-Temporal Attention Mechanism for Person Video Motion Transfer
Kun Cheng, Hao-Zhi Huang, Chun Yuan, Lingyiqing Zhou, Wei Liu

TL;DR
This paper introduces a novel method for person video motion transfer that uses multi-frame integration and spatio-temporal attention to enhance appearance detail and temporal consistency in generated videos.
Contribution
It proposes a multi-frame integration approach with a spatio-temporal attention mechanism for improved motion transfer in person videos, surpassing single-frame methods.
Findings
Produces more photo-realistic videos than previous methods.
Achieves better temporal consistency in generated videos.
Enables flexible background substitution.
Abstract
Existing person video generation methods either lack the flexibility in controlling both the appearance and motion, or fail to preserve detailed appearance and temporal consistency. In this paper, we tackle the problem of motion transfer for generating person videos, which provides controls on both the appearance and the motion. Specifically, we transfer the motion of one person in a target video to another person in a source video, while preserving the appearance of the source person. Besides only relying on one source frame as the existing state-of-the-art methods, our proposed method integrates information from multiple source frames based on a spatio-temporal attention mechanism to preserve rich appearance details. In addition to a spatial discriminator employed for encouraging the frame-level fidelity, a multi-range temporal discriminator is adopted to enforce the generated video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
