Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation
Jinlin Liu, Kai Yu, Mengyang Feng, Xiefan Guo, Miaomiao Cui

TL;DR
This paper presents a novel human video synthesis method that models both foreground and background dynamics using separate motion representations, resulting in more realistic and coherent videos with natural environmental interactions.
Contribution
It introduces a technique to learn and generate synchronized foreground and background motions, extending video length without error accumulation through clip-based generation and continuity strategies.
Findings
Generated videos show improved realism with dynamic backgrounds.
The method outperforms prior approaches in coherence between foreground and background.
Longer video sequences maintain consistency and natural motion.
Abstract
Recent advancements in human video synthesis have enabled the generation of high-quality videos through the application of stable diffusion models. However, existing methods predominantly concentrate on animating solely the human element (the foreground) guided by pose information, while leaving the background entirely static. Contrary to this, in authentic, high-quality videos, backgrounds often dynamically adjust in harmony with foreground movements, eschewing stagnancy. We introduce a technique that concurrently learns both foreground and background dynamics by segregating their movements using distinct motion representations. Human figures are animated leveraging pose-based motion, capturing intricate actions. Conversely, for backgrounds, we employ sparse tracking points to model motion, thereby reflecting the natural interaction between foreground activity and environmental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Human Pose and Action Recognition
MethodsContrastive Language-Image Pre-training · Diffusion
