TL;DR
This paper introduces a training-free, motion factorization framework for compositional video generation that decomposes complex motion into three categories and guides synthesis accordingly.
Contribution
It proposes a novel, model-agnostic motion factorization approach that improves motion synthesis by explicitly modeling motion categories without additional training.
Findings
Achieves impressive motion synthesis performance on real-world benchmarks.
Effectively disentangles motion categories during video generation.
Framework is compatible with various diffusion models.
Abstract
Compositional video generation aims to synthesize multiple instances with diverse appearance and motion. However, current approaches mainly focus on binding semantics, neglecting to understand diverse motion categories specified in prompts. In this paper, we propose a motion factorization framework that decomposes complex motion into three primary categories: motionlessness, rigid motion, and non-rigid motion. Specifically, our framework follows a planning before generation paradigm. (1) During planning, we reason about motion laws on the motion graph to obtain frame-wise changes in the shape and position of each instance. This alleviates semantic ambiguities in the user prompt by organizing it into a structured representation of instances and their interactions. (2) During generation, we modulate the synthesis of distinct motion categories in a disentangled manner. Conditioned on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
