Forecasting Motion in the Wild
Neerja Thakkar, Shiry Ginosar, Jacob Walker, Jitendra Malik, Joao Carreira, Carl Doersch

TL;DR
This paper introduces a novel approach using dense point trajectories and a diffusion transformer to predict complex, diverse motion patterns of animals in the wild, improving generalization and data efficiency.
Contribution
It presents a structured mid-level representation for motion, enabling coherent forecasting across diverse non-rigid agents and outperforms existing methods.
Findings
Forecasting trajectory tokens achieves category-agnostic prediction.
The method outperforms state-of-the-art baselines.
It generalizes to rare species and morphologies.
Abstract
Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons about occlusion, enabling coherent forecasts of complex motion patterns. To evaluate at scale, we curate 300 hours of unconstrained animal video with robust shot detection and camera-motion compensation. Experiments show that forecasting trajectory tokens achieves category-agnostic, data-efficient prediction, outperforms state-of-the-art baselines, and generalizes to rare species and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
