TL;DR
SynMotion is a novel video generation model that jointly leverages semantic guidance and visual adaptation to produce motion-customized videos with high fidelity and diversity.
Contribution
It introduces dual-embedding semantic comprehension, motion adapters, and a new training strategy supported by a curated benchmark, advancing motion customization in video synthesis.
Findings
Outperforms existing baselines in T2V and I2V tasks.
Enhances motion fidelity and temporal coherence.
Promotes motion specificity while maintaining subject diversity.
Abstract
Diffusion-based video motion customization facilitates the acquisition of human motion representations from a few video samples, while achieving arbitrary subjects transfer through precise textual conditioning. Existing approaches often rely on semantic-level alignment, expecting the model to learn new motion concepts and combine them with other entities (e.g., ''cats'' or ''dogs'') to produce visually appealing results. However, video data involve complex spatio-temporal patterns, and focusing solely on semantics cause the model to overlook the visual complexity of motion. Conversely, tuning only the visual representation leads to semantic confusion in representing the intended action. To address these limitations, we propose SynMotion, a new motion-customized video generation model that jointly leverages semantic guidance and visual adaptation. At the semantic level, we introduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
