DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data
Wonjoon Jin, Jiyun Won, Janghyeok Han, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho

TL;DR
DynaVid introduces a novel two-stage video synthesis framework that leverages synthetic optical flow data to improve the realism and controllability of highly dynamic videos, addressing dataset limitations.
Contribution
The paper presents a two-stage framework that decouples motion synthesis from appearance, using synthetic optical flow to enhance dynamic video generation.
Findings
Improves realism in videos with vigorous human motion.
Enhances controllability in camera motion scenarios.
Outperforms existing models on dynamic motion benchmarks.
Abstract
Despite recent progress, video diffusion models still struggle to synthesize realistic videos involving highly dynamic motions or requiring fine-grained motion controllability. A central limitation lies in the scarcity of such examples in commonly used training datasets. To address this, we introduce DynaVid, a video synthesis framework that leverages synthetic motion data in training, which is represented as optical flow and rendered using computer graphics pipelines. This approach offers two key advantages. First, synthetic motion offers diverse motion patterns and precise control signals that are difficult to obtain from real data. Second, unlike rendered videos with artificial appearances, rendered optical flow encodes only motion and is decoupled from appearance, thereby preventing models from reproducing the unnatural look of synthetic videos. Building on this idea, DynaVid adopts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
