DisMo: Disentangled Motion Representations for Open-World Motion Transfer

Thomas Ressler-Antal; Frank Fundel; Malek Ben Alaya; Stefan Andreas Baumann; Felix Krause; Ming Gui; Bj\"orn Ommer

arXiv:2511.23428·cs.CV·December 1, 2025

DisMo: Disentangled Motion Representations for Open-World Motion Transfer

Thomas Ressler-Antal, Frank Fundel, Malek Ben Alaya, Stefan Andreas Baumann, Felix Krause, Ming Gui, Bj\"orn Ommer

PDF

Open Access 1 Video

TL;DR

DisMo introduces a new method for learning abstract, disentangled motion representations from raw videos, enabling open-world motion transfer across diverse entities and improving motion understanding tasks.

Contribution

The paper presents DisMo, a novel approach for learning generic motion representations that are independent of appearance, facilitating flexible motion transfer and superior motion understanding.

Findings

01

Effective open-world motion transfer across unrelated entities

02

Outperforms state-of-the-art in zero-shot action classification

03

Compatible with existing video generators for enhanced flexibility

Abstract

Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual descriptions or initial frames. However, these models often fail to provide an explicit representation of motion separate from content, limiting their applicability for content creators. To address this gap, we propose DisMo, a novel paradigm for learning abstract motion representations directly from raw video data via an image-space reconstruction objective. Our representation is generic and independent of static information such as appearance, object identity, or pose. This enables open-world motion transfer, allowing motion to be transferred across semantically unrelated entities without requiring object correspondences, even between vastly different categories. Unlike prior methods, which trade off motion fidelity and prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DisMo: Disentangled Motion Representations for Open-World Motion Transfer· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Multimodal Machine Learning Applications