Disentangling Foreground and Background Motion for Enhanced Realism in   Human Video Generation

Jinlin Liu; Kai Yu; Mengyang Feng; Xiefan Guo; Miaomiao Cui

arXiv:2405.16393·cs.CV·May 29, 2024

Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation

Jinlin Liu, Kai Yu, Mengyang Feng, Xiefan Guo, Miaomiao Cui

PDF

Open Access

TL;DR

This paper presents a novel human video synthesis method that models both foreground and background dynamics using separate motion representations, resulting in more realistic and coherent videos with natural environmental interactions.

Contribution

It introduces a technique to learn and generate synchronized foreground and background motions, extending video length without error accumulation through clip-based generation and continuity strategies.

Findings

01

Generated videos show improved realism with dynamic backgrounds.

02

The method outperforms prior approaches in coherence between foreground and background.

03

Longer video sequences maintain consistency and natural motion.

Abstract

Recent advancements in human video synthesis have enabled the generation of high-quality videos through the application of stable diffusion models. However, existing methods predominantly concentrate on animating solely the human element (the foreground) guided by pose information, while leaving the background entirely static. Contrary to this, in authentic, high-quality videos, backgrounds often dynamically adjust in harmony with foreground movements, eschewing stagnancy. We introduce a technique that concurrently learns both foreground and background dynamics by segregating their movements using distinct motion representations. Human figures are animated leveraging pose-based motion, capturing intricate actions. Conversely, for backgrounds, we employ sparse tracking points to model motion, thereby reflecting the natural interaction between foreground activity and environmental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Human Pose and Action Recognition

MethodsContrastive Language-Image Pre-training · Diffusion