DisMo: Disentangled Motion Representations for Open-World Motion Transfer
Thomas Ressler-Antal, Frank Fundel, Malek Ben Alaya, Stefan Andreas Baumann, Felix Krause, Ming Gui, Bj\"orn Ommer

TL;DR
DisMo introduces a new method for learning abstract, disentangled motion representations from raw videos, enabling open-world motion transfer across diverse entities and improving motion understanding tasks.
Contribution
The paper presents DisMo, a novel approach for learning generic motion representations that are independent of appearance, facilitating flexible motion transfer and superior motion understanding.
Findings
Effective open-world motion transfer across unrelated entities
Outperforms state-of-the-art in zero-shot action classification
Compatible with existing video generators for enhanced flexibility
Abstract
Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual descriptions or initial frames. However, these models often fail to provide an explicit representation of motion separate from content, limiting their applicability for content creators. To address this gap, we propose DisMo, a novel paradigm for learning abstract motion representations directly from raw video data via an image-space reconstruction objective. Our representation is generic and independent of static information such as appearance, object identity, or pose. This enables open-world motion transfer, allowing motion to be transferred across semantically unrelated entities without requiring object correspondences, even between vastly different categories. Unlike prior methods, which trade off motion fidelity and prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Multimodal Machine Learning Applications
