Envisioning the Future, One Step at a Time

Stefan Andreas Baumann; Jannik Wiese; Tommaso Martorella; Mahdi M. Kalayeh; Bj\"orn Ommer

arXiv:2604.09527·cs.CV·April 13, 2026

Envisioning the Future, One Step at a Time

Stefan Andreas Baumann, Jannik Wiese, Tommaso Martorella, Mahdi M. Kalayeh, Bj\"orn Ommer

PDF

2 Repos 1 Models 2 Datasets

TL;DR

This paper introduces a sparse, trajectory-based autoregressive diffusion model for predicting diverse, long-horizon future scene dynamics efficiently, outperforming dense methods in speed and accuracy.

Contribution

It proposes a novel sparse point trajectory approach with an autoregressive diffusion model and introduces OWM, a new benchmark for open-set motion prediction in real-world videos.

Findings

01

Achieves faster prediction speeds than dense simulators.

02

Matches or exceeds dense methods in predictive accuracy.

03

Enables scalable, long-range, multi-modal future scene simulation.

Abstract

Accurately anticipating how complex, diverse scenes will evolve requires models that represent uncertainty, simulate along extended interaction chains, and efficiently explore many plausible futures. Yet most existing approaches rely on dense video or latent-space prediction, expending substantial capacity on dense appearance rather than on the underlying sparse trajectories of points in the scene. This makes large-scale exploration of future hypotheses costly and limits performance when long-horizon, multi-modal motion is essential. We address this by formulating the prediction of open-set future scene dynamics as step-wise inference over sparse point trajectories. Our autoregressive diffusion model advances these trajectories through short, locally predictable transitions, explicitly modeling the growth of uncertainty over time. This dynamics-centric representation enables fast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
CompVis/myriad
model· ♡ 5
♡ 5

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.