Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Nick Stracke; Kolja Bauer; Stefan Andreas Baumann; Miguel Angel Bautista; Josh Susskind; Bj\"orn Ommer

arXiv:2604.11737·cs.CV·April 14, 2026

Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Nick Stracke, Kolja Bauer, Stefan Andreas Baumann, Miguel Angel Bautista, Josh Susskind, Bj\"orn Ommer

PDF

1 Repo 1 Models

TL;DR

This paper introduces a method for efficient long-term motion generation by learning a compressed motion embedding from large-scale trajectories, enabling realistic motion synthesis conditioned on text or spatial prompts.

Contribution

It proposes a novel long-term motion embedding learned via temporal compression, combined with a conditional flow-matching model for improved motion generation.

Findings

01

Motion embedding achieves 64x temporal compression.

02

Generated motions outperform state-of-the-art video models.

03

Method enables realistic long-term motion synthesis conditioned on prompts.

Abstract

Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude more efficiently by directly operating on a long-term motion embedding that is learned from large-scale trajectories obtained from tracker models. This enables efficient generation of long, realistic motions that fulfill goals specified via text prompts or spatial pokes. To achieve this, we first learn a highly compressed motion embedding with a temporal compression factor of 64x. In this space, we train a conditional flow-matching model to generate motion latents conditioned on task descriptions. The resulting motion distributions outperform those of both state-of-the-art video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

compvis/long-term-motion
github

Models

🤗
CompVis/ZipMo
model· ♡ 8
♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.